Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Block bootstrapping time series data

The usual bootstrapping method doesn't preserve the ordering of time series data, and it is, therefore, unsuitable for trend estimation. In the block bootstrapping approach, we split data into non-overlapping blocks of equal size and use those blocks to generate new samples. In this recipe, we will apply a very naive and easy-to-implement linear model with annual temperature data. The procedure for this recipe is as follows:

Split the data into blocks and generate new data samples.
Fit the data to a line or calculate the first differences of the new data.
Repeat the previous step to build a list of slopes or medians of the first differences.

How to do it...

The imports are as follows:

import dautil as dl
import random
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
import ch6util
from IPython.display import HTML

Define the following function to bootstrap the data:

def shuffle(temp, blocks):
    random.shuffle(blocks)
    df = pd.DataFrame({'TEMP': dl.collect.flatten(blocks)},
                      index=temp.index)
    df = df.resample('A')

    return df

Load the data and create blocks from it:

temp = dl.data.Weather.load()['TEMP'].resample('M').dropna()
blocks = list(dl.collect.chunk(temp.values, 100))
random.seed(12033)

Plot a couple of random realizations as a sanity check:

sp = dl.plotting.Subplotter(2, 2, context)
cp = dl.plotting.CyclePlotter(sp.ax)
medians = []
slopes = []

for i in range(240):
    df = shuffle(temp, blocks)
    slopes.append(ch6util.fit(df))
    medians.append(ch6util.diff_median(df))
    
    if i < 5:
        cp.plot(df.index, df.values)
        
sp.label(ylabel_params=dl.data.Weather.get_header('TEMP'))

Plot the distribution of the first difference medians using the bootstrapped data:
```
sns.distplot(medians, ax=sp.next_ax(), norm_hist=True)
sp.label()
```
Plot the distribution of the linear regression slopes using the bootstrapped data:
```
sns.distplot(slopes, ax=sp.next_ax(), norm_hist=True)
sp.label()
```

Plot the confidence intervals for a varying number of bootstraps:

mins = []
tops = []
xrng = range(30, len(medians))

for i in xrng:
    min, max = dl.stats.outliers(medians[:i])
    mins.append(min)
    tops.append(max)

cp = dl.plotting.CyclePlotter(sp.next_ax())
cp.plot(xrng, mins, label='5 %')
cp.plot(xrng, tops, label='95 %')
sp.label()
HTML(sp.exit())

Refer to the following screenshot for the end result:

The following code comes from the block_boot.ipynb file in this book's code bundle.

Table of Contents for
Block bootstrapping time series data

Block bootstrapping time series data

How to do it...

See also

Table of Contents for Block bootstrapping time series data

Create new playlist

Sign In

Sign Up

Block bootstrapping time series data

How to do it...

See also

Table of Contents for
Block bootstrapping time series data