Kalman filters

Kalman filters are a method of extracting a signal from either noisy or incomplete measurements. They were invented by Hungarian-born, American engineer, Rudolf Emil Kalman, for the purpose of electrical engineering, and were first used in the Apollo Space program in the 1960s.

The basic idea behind the Kalman filter is that there is some hidden state of a system that we cannot observe directly but for which we can obtain noisy measurements. Imagine you want to measure the temperature inside a rocket engine. You cannot put a measurement device directly into the engine, because it's too hot, but you can have a device on the outside of the engine.

Naturally, this measurement is not going to be perfect, as there are a lot of external factors occurring outside of the engine that make the measurement noisy. Therefore, to estimate the temperature inside the rocket, you need a method that can deal with the noise. We can think of the internal state in the page forecasting as the actual interest in a certain page, of which the page views represent only a noisy measurement.

The idea here is that the internal state, Kalman filters, at time k is a state transition matrix, A, multiplied with the previous internal state, Kalman filters, plus some process noise, Kalman filters. How interest in the Wikipedia page of 2NE1 develops is to some degree random. The randomness is assumed to follow a Gaussian normal distribution with mean zero and variance Q:

Kalman filters

The obtained measurement at time k, Kalman filters, is an observation model, H, describing how states translate to measurements times the state, Kalman filters, plus some observation noise, Kalman filters. The observation noise is assumed to follow a Gaussian normal distribution with mean zero and variance R:

Kalman filters

Roughly speaking, Kalman filters fit a function by estimating A, H, Q, and R. The process of going over a time series and updating the parameters is called smoothing. The exact mathematics of the estimation process is complicated and not very relevant if all we want to do is forecasting. Yet, what is relevant is that we need to provide priors to these values.

We should note that our state does not have to be only one number. In this case, our state is an eight-dimensional vector, with one hidden level as well as seven levels to capture weekly seasonality, as we can see in this code:

n_seasons = 7

state_transition = np.zeros((n_seasons+1, n_seasons+1))

state_transition[0,0] = 1

state_transition[1,1:-1] = [-1.0] * (n_seasons-1)
state_transition[2:,1:-1] = np.eye(n_seasons-1)

The transition matrix, A, looks like the following table, describing one hidden level, which we might interpret as the real interest as well as a seasonality model:

array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0., -1., -1., -1., -1., -1., -1.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.]])

The observation model, H, maps the general interest plus seasonality to a single measurement:

observation_model = [[1,1] + [0]*(n_seasons-1)]

The observation model looks like this:

[[1, 1, 0, 0, 0, 0, 0, 0]]

The noise priors are just estimates scaled by a "smoothing factor," which allows us to control the update process:

smoothing_factor = 5.0

level_noise = 0.2 / smoothing_factor
observation_noise = 0.2
season_noise = 1e-3

process_noise_cov = np.diag([level_noise, season_noise] + [0]*(n_seasons-1))**2
observation_noise_cov = observation_noise**2

process_noise_cov is an eight-dimensional vector, matching the eight-dimensional state vector. Meanwhile, observation_noise_cov is a single number, as we have only a single measurement. The only real requirement for these priors is that their shapes must allow the matrix multiplications described in the two preceding formulas. Other than that, we are free to specify transition models as we see them.

Otto Seiskari, a mathematician and 8th place winner in the original Wikipedia traffic forecasting competition, wrote a very fast Kalman filtering library, which we will be using here. His library allows for the vectorized processing of multiple independent time series, which is very handy if you have 145,000 time series to process.

Note

Note: The library's repository can be found here: https://github.com/oseiskar/simdkalman.

You can install his library using the following command:

pip install simdkalman

To import it, run the following code:

import simdkalman

Although simdkalman is very sophisticated, it is quite simple to use. Firstly, we are going to specify a Kalman filter using the priors we just defined:

kf = simdkalman.KalmanFilter(state_transition = state_transition,process_noise = process_noise_cov,observation_model = observation_model,observation_noise = observation_noise_cov)

From there we can then estimate the parameters and compute a forecast in one step:

result = kf.compute(X_train[0], 50)

Once again, we make forecasts for 2NE1's Chinese page and create a forecast for 50 days. Take a minute to note that we could also pass multiple series, for example, the first 10 with X_train[:10], and compute separate filters for all of them at once.

The result of the compute function contains the state and observation estimates from the smoothing process as well as predicted internal states and observations. States and observations are Gaussian distributions, so to get a plottable value, we need to access their mean.

Our states are eight-dimensional, but we only care about the non-seasonal state value, so we need to index the mean, which we can achieve by running the following:

fig, ax = plt.subplots(figsize=(10, 7))
ax.plot(np.arange(480,500),X_train[0,480:], label='X')
ax.plot(np.arange(500,550),y_train[0],label='True')

ax.plot(np.arange(500,550),
        result.predicted.observations.mean,
        label='Predicted observations')

ax.plot(np.arange(500,550),
        result.predicted.states.mean[:,0],
        label='predicted states')

ax.plot(np.arange(480,500),
        result.smoothed.observations.mean[480:],
        label='Expected Observations')

ax.plot(np.arange(480,500),
        result.smoothed.states.mean[480:,0],
        label='States')



ax.legend()
ax.set_yscale('log')

The preceding code will then output the following chart:

Kalman filters

Predictions and inner states from the Kalman filter

We can clearly see in the preceding graph the effects of our prior modeling on the predictions. We can see the model predicts strong weekly oscillation, stronger than actually observed. Likewise, we can also see that the model does not anticipate any trends since we did not see model trends in our prior model.

Kalman filters are a useful tool and are used in many applications, from electrical engineering to finance. In fact, until relatively recently, they were the go-to tool for time series modeling. Smart modelers were able to create smart systems that described the time series very well. However, one weakness of Kalman filters is that they cannot discover patterns by themselves and need carefully engineered priors in order to work.

In the second half of this chapter, we will be looking at neural network-based approaches that can automatically model time series, and often with higher accuracy.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset