Resampling means changing the frequency of the observed time series. For example, in this dataset, a data point is observed every few seconds. This dataset can be resampled to an hourly frequency where all the data points for an hour will be aggregated using an aggregation function of choice to result in one data point for an hour. It can be done at a daily level as well, where all the data points in a day will be aggregated. Resampling can also be thought of as data smoothing as it smooths or averages out the bumps in data.
In pandas, it is easy to resample time series data as there is a built-in function for that. Let's see how we can use that.
For example, to resample at an hourly level, we write the following code:
ts[["Humidity"]].resample("1h").median().plot(figsize=(15,1))
The following is the output:
Similarly, to resample at a daily level, we write the following code:
ts[["Humidity"]].resample("1d").median().plot(figsize=(15,1))
The following is the output:
Please note how data sampled at an hourly level has more variations than the daily one, which is smoother.
Rolling is also a similar concept for aggregating data points, although it is more flexible. A rolling window, that is, the number of data points that are aggregated can be provided to control the level of aggregation or smoothing.
If you look at the datetime column carefully, you can see that a data point has been observed every minute. Hence, 60 such points constitute an hour. Let's see how we can use the rolling method to aggregate the data.
For rolling 60 data points, starting from each data point as one record, we provide 60 as the rolling window, which is shown as follows. This should return a plot similar to the hourly resampling previously obtained:
ts[["Humidity"]].rolling(60).median().plot(figsize=(15,1))
The following is the output:
For rolling at a day level, the rolling window should be 60 x 24:
ts[["Humidity"]].rolling(60*24).median().plot(figsize=(15,1))
The following is the output: