Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Rebinning data

Often, the data we have is not structured the way we want to use it. A structuring technique we can use is called (statistical) data binning or bucketing. This strategy replaces values within an interval (a bin) with one representative value. In the process, we may lose information; however, we gain better control over the data and efficiency.

In the weather dataset, we have wind direction in degrees and wind speed in m/s, which can be represented in a different way. In this recipe, I chose to present wind direction with cardinal directions (north, south, and so on). For the wind speed, I used the Beaufort scale (visit https://en.wikipedia.org/wiki/Beaufort_scale).

How to do it...

Follow these instructions to rebin the data:

The imports are as follows:

import dautil as dl
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from IPython.display import HTML

Load and rebin the data as follows (wind direction is in degree 0-360; we rebin to cardinal directions such as north, southwest, and so on):

df = dl.data.Weather.load()[['WIND_SPEED', 'WIND_DIR']].dropna()
categorized = df.copy()
categorized['WIND_DIR'] = dl.data.Weather.categorize_wind_dir(df)
categorized['WIND_SPEED'] = dl.data.Weather.beaufort_scale(df)

Show distributions and countplots with the following code:

sp = dl.plotting.Subplotter(2, 2, context)
sns.distplot(df['WIND_SPEED'], ax=sp.ax)
sp.label(xlabel_params=dl.data.Weather.get_header('WIND_SPEED'))

sns.distplot(df['WIND_DIR'], ax=sp.next_ax())
sp.label(xlabel_params=dl.data.Weather.get_header('WIND_DIR'))

sns.countplot(x='WIND_SPEED', data=categorized, ax=sp.next_ax())
sp.label()

sns.countplot(x='WIND_DIR', data=categorized, ax=sp.next_ax())
sp.label()
plt.tight_layout()
HTML(dl.report.HTMLBuilder().watermark())

Refer to the following screenshot for the end result (refer to the rebinning_data.ipynb file in this book's code bundle):

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Rebinning data

Create new playlist

Sign In

Sign Up

Rebinning data

How to do it...

Table of Contents for
Rebinning data