Often, the data we have is not structured the way we want to use it. A structuring technique we can use is called (statistical) data binning or bucketing. This strategy replaces values within an interval (a bin) with one representative value. In the process, we may lose information; however, we gain better control over the data and efficiency.
In the weather dataset, we have wind direction in degrees and wind speed in m/s, which can be represented in a different way. In this recipe, I chose to present wind direction with cardinal directions (north, south, and so on). For the wind speed, I used the Beaufort scale (visit https://en.wikipedia.org/wiki/Beaufort_scale).
Follow these instructions to rebin the data:
import dautil as dl import seaborn as sns import matplotlib.pyplot as plt import pandas as pd import numpy as np from IPython.display import HTML
df = dl.data.Weather.load()[['WIND_SPEED', 'WIND_DIR']].dropna() categorized = df.copy() categorized['WIND_DIR'] = dl.data.Weather.categorize_wind_dir(df) categorized['WIND_SPEED'] = dl.data.Weather.beaufort_scale(df)
sp = dl.plotting.Subplotter(2, 2, context) sns.distplot(df['WIND_SPEED'], ax=sp.ax) sp.label(xlabel_params=dl.data.Weather.get_header('WIND_SPEED')) sns.distplot(df['WIND_DIR'], ax=sp.next_ax()) sp.label(xlabel_params=dl.data.Weather.get_header('WIND_DIR')) sns.countplot(x='WIND_SPEED', data=categorized, ax=sp.next_ax()) sp.label() sns.countplot(x='WIND_DIR', data=categorized, ax=sp.next_ax()) sp.label() plt.tight_layout() HTML(dl.report.HTMLBuilder().watermark())
Refer to the following screenshot for the end result (refer to the rebinning_data.ipynb
file in this book's code bundle):