Resampling and rolling of the time series data

Resampling means changing the frequency of the observed time series. For example, in this dataset, a data point is observed every few seconds. This dataset can be resampled to an hourly frequency where all the data points for an hour will be aggregated using an aggregation function of choice to result in one data point for an hour. It can be done at a daily level as well, where all the data points in a day will be aggregated. Resampling can also be thought of as data smoothing as it smooths or averages out the bumps in data.

In pandas, it is easy to resample time series data as there is a built-in function for that. Let's see how we can use that.

For example, to resample at an hourly level, we write the following code:

ts[["Humidity"]].resample("1h").median().plot(figsize=(15,1)) 

The following is the output:

Resampling the data at an hourly level using the median as the aggregate measure

Similarly, to resample at a daily level, we write the following code: 

    ts[["Humidity"]].resample("1d").median().plot(figsize=(15,1))
  

The following is the output:

 Resampling the data at a daily level using the median as the aggregate measure

Please note how data sampled at an hourly level has more variations than the daily one, which is smoother. 

Rolling is also a similar concept for aggregating data points, although it is more flexible. A rolling window, that is, the number of data points that are aggregated can be provided to control the level of aggregation or smoothing.

If you look at the datetime column carefully, you can see that a data point has been observed every minute. Hence, 60 such points constitute an hour. Let's see how we can use the rolling method to aggregate the data.

For rolling 60 data points, starting from each data point as one record, we provide 60 as the rolling window, which is shown as follows. This should return a plot similar to the hourly resampling previously obtained:

    ts[["Humidity"]].rolling(60).median().plot(figsize=(15,1))  

The following is the output:

Rolling every consecutive 60 points and aggregating them to give the median as the final value

For rolling at a day level, the rolling window should be 60 x 24:

    ts[["Humidity"]].rolling(60*24).median().plot(figsize=(15,1))  

The following is the output:

Rolling every consecutive 60*24 points and aggregating them to give their median as the final value; this amounts to finding daily aggregate values for minute-level data
Note that the median has been used for aggregation. You can also use any other function such as mean or sum.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset