The Kalman filter is a mathematical model that provides an accurate and recursive computation approach to estimate the previous states and predict the future states of a process for which some variables may be unknown. R.E. Kalman introduced it in the early 60s to model dynamics systems and predict a trajectory in aerospace [3:10]. Today, the Kalman filter is used to discover a relationship between two observed variables that may or may not be associated with other hidden variables. In this respect, the Kalman filter shares some similarities with the Hidden Markov models, as described in the The hidden Markov model section in Chapter 7, Sequential Data Models [3:11].
The Kalman filter is used as:
Smoothing versus filtering
Smoothing is an operation that removes high-frequency fluctuations from a time series or signal. Filtering consists of selecting a range of frequencies to process the data. In this regard, smoothing is somewhat similar to low-pass filtering. The only difference is that a low-pass filter is usually implemented through linear methods.
Conceptually, the Kalman filter estimates the state of a system from noisy observations. The Kalman filter has two characteristics:
The Kalman filter is one of the stochastic models that are used in adaptive control [3:12].
Kalman and nonlinear systems
The Kalman filter estimates the internal state of a linear dynamic system. However, it can be extended to a nonlinear state space model using linear or quadratic approximation functions. These filters are known as, you guessed it, Extended Kalman Filters (EKF), the theory of which is beyond the scope of this book.
The following section is dedicated to discrete Kalman filters for linear systems, as applied to financial engineering. A continuous signal can be converted to a time series using the Nyquist frequency.
The Kalman filter model consists of two core elements of a dynamic system: a process that generates data and a measurement that collects data. These elements are referred to as the state space model. Mathematically speaking, the state space model consists of two equations:
Let's consider a system with a linear state xt of n variables and a control input vector ut. The prediction of the state at time t is computed by a linear stochastic equation (M12):
The control input vector represents the external input (or control) to the state of the system. Most systems, including our financial example later in this chapter, have no external input to the state of the model.
The measurement of m values zt of the state of the system is defined by the following equation (M13):
The time dependency model
We cannot assume that the parameters of the generalized discrete Kalman filter, such as the state transition At, control input Bt, and observation matrices (or measurement dependency) Ht are independent of time. However, these parameters are constant in most practical applications.
The set of equations for the discrete Kalman filter is implemented as a recursive computation with two distinct steps:
The recursion is visualized in the following diagram:
Let's illustrate the prediction and correction phases in the context of filtering financial data, in a manner similar to the moving average and Fourier transform. The objective is to extract the trend and the transitory component of the yield of the 10-year Treasury bond. The Kalman filter is particularly suitable for the analysis of interest rates for two reasons:
The 10-year Treasury bond has a higher trading volume than bonds with longer maturity, making trends in interest rates a bit more reliable [3:13].
Applying the Kalman filter to clean raw data requires you to define a model that encompasses both observed and non-observed states. In the case of the trend analysis, we can safely create our model with a two-variable state: the current yield xt and the previous yield xt-1.
This implementation of the Kalman filter uses the Apache Commons Math library. Therefore, we need to specify the implicit conversion from our primitives, introduced in the Primitives and implicits section in Chapter 1, Getting Started, to the RealMatrix
, RealVector
, Array2DRowRealMatrix
, and ArrayRealVector
Apache Commons Math types:
implicit def double2RealMatrix(x: DblMatrix): RealMatrix = new Array2DRowRealMatrix(x) implicit def double2RealRow(x: DblVector): RealMatrix = new Array2DRowRealMatrix(x) implicit def double2RealVector(x: DblVector): RealVector = new ArrayRealVector(x)
The client code has to import the implicit conversion functions within its scope.
The Kalman model assumes that the process and measurement noise follows a Gaussian distribution, also known as a white noise. For the sake of maintainability, the generation of the white noise is encapsulated in the QRNoise
class with the following arguments (line 1
):
qr
: This is the tuple of scale factors for the process noise matrix Q and the measurement noise Rprofile
: This is the noise profile with the normal distribution as defaultThe two noiseQ
and noiseR
methods generate an array of two independent white noise elements (line 2
):
val normal = Stats.normal(_) class QRNoise(qr: DblPair,profile: Double=>Double = normal){ //1 def q = profile(qr._1) def r = profile(qr._2) lazy val noiseQ = Array[Double](q, q) //2 lazy val noiseR = Array[Double](r, r) }
The easiest approach to manage the matrices and vectors used in the recursion is to define them as arguments of a kalmanConfig
configuration class. The arguments of the configuration follow the naming convention defined in the mathematical formulas: A
is the state transition matrix, B
is the control matrix, H
is the matrix of observations that define the dependencies between the measurement and system state, and P
is the covariance error matrix:
case class KalmanConfig(A: DblMatrix, B: DblMatrix, H: DblMatrix, P: DblMatrix)
Let's implement the Kalman filter as a DKalman
transformation of the ETransform
type on a time series with a predefined KalmanConfig
configuration:
class DKalman(config: KalmanConfig)(implicit qrNoise: QRNoise) extends ETransform[KalmanConfig](config) { type U = Vector[DblPair] //3 type V = Vector[DblPair] //4 type KRState = (KalmanFilter, RealVector) //5 override def |> : PartialFunction[U, Try[V]] ... }
As with any explicit data transformation, we need to specify the U
and V
types (lines 3
and 4
), which are identical. The Kalman filter does not alter the structure of the data, it alters only the values. We define an internal state for the KRState
Kalman computation by creating a tuple of two KalmanFilter
and RealVector
(line 5
) Apache Commons Math types.
The key elements of the filter are now in place and it's time to implement the prediction-correction cycle portion of the Kalman algorithm.
The prediction phase consists of estimating the x state (yield of the Treasury bond) using the transition equation. We assume that the Federal Reserve has no material effect on the interest rates, making the B control input matrix null. The transition equation can be easily resolved using simple operations on matrices:
The purpose of this exercise is to evaluate the impact of the different parameters of the transition matrix A in terms of smoothing.
The control input matrix B
In this example, the control matrix B is null because there is no known, deterministic external action on the yield of the 10-year Treasury bond. However, the yield can be affected by unknown parameters that we represent as hidden variables. For example, the matrix B can be used to model the decision of the Federal Reserve regarding asset purchases and federal fund rates.
The mathematics behind the Kalman filter presented as a reference to the implementation in Scala, use the same notation for matrices and vectors. It is absolutely not a prerequisite to understand the Kalman filter and its implementation in the next section. If you have a natural inclination toward linear algebra, the following describe the two equations for the prediction step.
The prediction step
M14: The prediction of the state at time t is computed by extrapolating the state estimate:
M15: The mean square error matrix, P, which is to be minimized, is updated using the following formula:
The state transition matrix is implemented using the matrix and vector classes included in the Apache Commons Math library. The types of matrices and vectors are automatically converted into the RealMatrix
and RealVector
classes.
The implementation of the equation M14 is as follows:
x = A.operate(x).add(qrNoise.create(0.03, 0.1))
The new state is predicted (or estimated), and then used as an input to the correction step.
The second step of the recursive Kalman algorithm is the correction of the estimated yield of the 10-year Treasury bond with the actual yield. In this example, the white noise of the measurement is negligible. The measurement equation is simple because the state is represented by the current and previous yield and their measurement, z:
The sequence of mathematical equations of the correction phase consists of updating the estimation of the state x using the actual values z and computing the Kalman gain, K.
The correction step
M16: The state of the system x is estimated from the actual measurement z using the following formula:
M17: The Kalman gain is computed as:
Here, HT is the matrix transpose of H and Pt' is the estimate of the error covariance.
It is time to put our knowledge of the transition and measurement equations to test. The Apache Commons Math library defines the two DefaultProcessModel
and DefaultMeasurementModel
classes to encapsulate the components of the matrices and vectors. The historical values for the yield of the 10-year Treasury bond are loaded through the DataSource
method and mapped to the smoothed series that is the output of the filter:
override def |> : PartialFunction[U, Try[V]] = { case xt: U if( !xt.isEmpty) => Try( xt.map { case(current, prev) => { val models = initialize(current, prev) //6 val nState = newState(models) //7 (nState(0), nState(1)) //8 }} ) }
The data transformation for the Kalman filter initializes the process and measurement model for each data point in the private initialize
(line 6
) method, updates the state using the transition and correction equations iteratively in the newState
method (line 7
), and returns the filtered series of pair values (line 8
).
Exception handling
The code to catch and process exceptions thrown by the Apache Commons Math library is omitted as the standard practice in the book. As far as the execution of the Kalman filter is concerned, the following exceptions have to be handled:
NonSquareMatrixException
DimensionMismatchException
MatrixDimensionMismatchException
The initialize
method encapsulates the initialization of the pModel
process model (line 9
) and the mModel
measurement (observations dependencies) model (line 10
), as defined in the Apache Commons Math library:
def initialize(current: Double, prev: Double): KRState = { val pModel = new DefaultProcessModel(config.A, config.B, Q, input, config.P) //9 val mModel = new DefaultMeasurementModel(config.H, R) //10 val in = Array[Double](current, prev) (new KalmanFilter(pModel,mModel), new ArrayRealVector(in)) }
The exceptions thrown by the Apache Commons Math API are caught and processed through the Try
monad. The iterative prediction and correction of the smoothed yields of 10-year Treasury bond is implemented by the newState
method. The method iterates through the following steps:
KalmanFilter.predict
method that implements the M14 formula (line 11
).12
).13
).KalmanFilter.correct
method to implement the M16 formula (line 14
).KalmanFilter.getStateEstimation
method (line 15
).def newState(state: KRState): DblArray = { state._1.predict //11 val x = config.A.operate(state._2).add(qrNoise.noisyQ) //12 val z = config.H.operate(x).add(qrNoise.noisyR) //13 state._1.correct(z) //14 state._1.getStateEstimation //15 }
The exit condition
In the code snippet for the newState
method, the iteration for specific data points exits when the maximum number of iterations is reached. A more elaborate implementation consists of either evaluating the matrix P at each iteration or estimation converged within a predefined range.
So far, we have studied the Kalman filtering algorithm. We need to adapt it to the smoothing of a time series. The fixed lag smoothing technique consists of backward correcting previous data points, taking into account the latest actual value.
A N-lag smoother defines the input as a vector X = {xt-N-1, xt-N-2, …, xt} for which the value xt-N-j is corrected taking into account the current value of xt.
The strategy is quite similar to the hidden Markov model forward and backward passes (refer to the Evaluation – CF-1 section under The hidden Markov model in Chapter 7, Sequential Data Models).
The objective is to smoothen the yield of the 10-year Treasury bond using a two-step lag smoothing algorithm.
The state equation updates the values of the state [xt, xt-1] using the previous state [xt-1, xt-2], where xt represents the yield of the 10-year Treasury bond at time t. This is accomplished by shifting the values of the original time series {x0 … xn-1} by 1 using the drop method, X1
={x1, …, xn-1}, creating a copy of the original time series without the last element X2={x0, …, xn-2}, and zipping X1 and X2. This process is implemented by the zipWithShift
method, which is introduced in the first section of the chapter.
The resulting sequence of a state vector Sk = [xk, xk-1]T is processed by the Kalman algorithm, as shown in the following code:
Import YahooFinancials._ val RESOURCE_DIR = "resources/data/chap3/" implicit val qrNoise = new QRNoise((0.7, 0.3)) //16 val H: DblMatrix = ((0.9, 0.0), (0.0, 0.1)) //17 val P0: DblMatrix = ((0.4, 0.3), (0.5, 0.4)) //18 val ALPHA1 = 0.5; val ALPHA2 = 0.8 val src = DataSource(s"${RESOURCE_DIR}${symbol}.csv", false) (src.get(adjClose)).map(zt => { //19 twoStepLagSmoother(zt, ALPHA1) //20 twoStepLagSmoother(zt, ALPHA2) })
An implicit noise instance
The noise for the process and measurement is defined as an implicit argument to the DKalman
Kalman filter for the following two reasons:
A
, B
, and H
Kalman configuration parameters. Therefore, it cannot be a member of the KalmanConfig
class.The white noise for the process and measurement is initialized implicitly with the qrNoise
value (line 16
). The code initializes the matrices H
of the measurement dependencies on the state (line 17
) and P0
that contains the initial covariance errors (line 18
). The input data is extracted from a CSV file that contains the daily Yahoo financial data (line 19
). Finally, the method executes the twoStepLagSmoother
two-step lag smoothing algorithm with two different ALPHA1
and ALPHA2
alpha parameter values (line 20
).
Let's take a look at the twoStepLagSmoother
method:
def twoStepLagSmoother(zSeries: DblVector,alpha: Double): Int = { val A: DblMatrix = ((alpha, 1.0-alpha), (1.0, 0.0)) //21 val xt = zipWithShift(1) //22 val pfnKalman = DKalman(A, H, P0) |> //23 pfnKalman(xt).map(filtered => //24 display(zSeries, filtered.map(_._1), alpha) ) }
The twoStepLagSmoother
method takes two arguments:
zSeries
single variable time series alpha
state transition parameter It initializes the state transition matrix A
using the alpha
exponential moving average decay parameter (line 21
). It creates the two-step lag time series, xt
, using the zipWithShift
method (line 22
). It extracts the pfnKalman
partial function (line 23
), processes, and finally, displays the two-step lag time series (line 24
).
Modeling state transition and noise
The state transition and the noise related to the process have to be selected carefully. The resolution of the state equations relies on the Cholesky (QR) decomposition, which requires a nonnegative definite matrix. The implementation in the Apache Commons Math library throws a NonPositiveDefiniteMatrixException
exception if the principle is violated.
The smoothed yield is plotted along the raw data as follows:
The Kalman filter is able to smooth the historical yield of the 10-year Treasury bond while preserving the spikes and lower frequency noise. Let's analyze the data for a shorter period during which the noise is the strongest, between the 190th and the 275th trading days:
The high frequency noise has been significantly reduced without cancelling the actual spikes. The distribution (0.8, 0.2) takes into consideration the previous state and favors the predicted value. Contrarily, a run with a state transition matrix A [0.2, 0.8, 0.0, 1.0] that favors the latest measurement will preserve the noise, as seen in the following graph:
The Kalman filter is a very useful and powerful tool used to help you understand the distribution of the noise between the process and observation. Contrary to the low or band-pass filters based on the discrete Fourier transform, the Kalman filter does not require the computation of the frequencies spectrum or assume the range of frequencies of the noise.
However, the linear discrete Kalman filter has its limitations, which are as follows: