This function allows for fast and flexible Kalman filtering. Both, the
measurement and transition equation may be multivariate and parameters
are allowed to be timevarying. In addition “NA”values in the
observations are supported. fkf
wraps the C
function
FKF
which fully relies on linear algebra subroutines contained
in BLAS and LAPACK.
1 
a0 
A 
P0 
A 
dt 
A 
ct 
A 
Tt 
An 
Zt 
An 
HHt 
An 
GGt 
An 
yt 
A 
check.input 
A 
State space form:
The following notation is closest to the one of Koopman et al. The state space model is represented by the transition equation and the measurement equation. Let m be the dimension of the state variable, d be the dimension of the observations, and n the number of observations. The transition equation and the measurement equation are given by
alpha(t + 1) = d(t) + T(t) alpha(t) + H(t) * eta(t)
y(t) = c(t) + Z(t) alpha(t) + G(t) * epsilon(t),
where eta(t) and epsilon(t) are iid N(0, I(m)) and iid N(0, I(d)), respectively, and alpha(t) denotes the state variable. The parameters admit the following dimensions:
a[t] in R^m  d[t] in R^m  eta[t] in R^m 
d[t] in R^(m * m)  d[t] in R^(m * m)  
y[t] in R^d  c[t] in R^d  epsilon[t] in R^d. 
Z[t] in R^(d * m)  G[t] in R^(d * d) 
Note that fkf
takes as input HHt
and GGt
which
corresponds to H[t] %*% t(H[t]) and G[t] %*% t(G[t]).
Iteration:
Let i
be the loop variable. The filter iterations are
implemented the following way (in case of no NA's):
Initialization:
if(i == 1){
at[, i] = a0
Pt[,, i] = P0
}
Updating equations:
vt[, i] = yt[, i]  ct[, i]  Zt[,,i] %*% at[, i]
Ft[,, i] = Zt[,, i] %*% Pt[,, i] %*% t(Zt[,, i]) + GGt[,, i]
Kt[,, i] = Pt[,, i] %*% t(Zt[,, i]) %*% solve(Ft[,, i])
att[, i] = at[, i] + Kt[,, i] %*% vt[, i]
Ptt[, i] = Pt[,, i]  Pt[,, i] %*% t(Zt[,, i]) %*% t(Kt[,, i])
Prediction equations:
at[, i + 1] = dt[, i] + Tt[,, i] %*% att[, i]
Pt[,, i + 1] = Tt[,, i] %*% Ptt[,, i] %*% t(Tt[,, i]) + HHt[,, i]
Next iteration:
i < i + 1
goto “Updating equations”.
NAvalues:
NAvalues in the observation matrix yt
are supported. If
particular observations yt[,i]
contain NAs, the NAvalues are
removed and the measurement equation is adjusted accordingly. When
the full vector yt[,i]
is missing the Kalman filter reduces to
a prediction step.
Parameters:
The parameters can either be constant or deterministic timevarying. Assume the number of observations is n (i.e. y = y[,1:n]). Then, the parameters admit the following classes and dimensions:
dt  either a m * n (timevarying) or a m * 1 (constant) matrix. 
Tt  either a m * m * n or a m * m * 1 array. 
HHt  either a m * m * n or a m * m * 1 array. 
ct  either a d * n or a d * 1 matrix. 
Zt  either a d * m * n or a d * m * 1 array. 
GGt  either a d * d * n or a d * d * 1 array. 
yt  a d * n matrix. 
If check.input
is TRUE
each argument will be checked for
correctness of the dimensionality, storage mode, and
class. check.input
should always be TRUE
unless the
performance becomes crucial and correctness of the arguments
concerning dimensions, class, and storage.mode is ensured.
Note that the class of the arguments if of importance. For instance,
to check whether a parameter is constant the dim
attribute is
accessed. If, e.g., Zt
is a constant, it could be a d * dmatrix. But the third dimension
(i.e. dim(Zt)[3]
) is needed to check for constancy. This
requires Zt
to be an d * d *
1array.
BLAS and LAPACK routines used:
The R function fkf
basically wraps the C
function
FKF
, which entirely relies on linear algebra subroutines
provided by BLAS and LAPACK. The following functions are used:
BLAS:  dcopy , dgemm , daxpy . 
LAPACK:  dpotri , dpotrf .

FKF
is called through the .Call
interface. Internally,
FKF
extracts the dimensions, allocates memory, and initializes
the Robjects to be returned. FKF
subsequently calls
cfkf
which performs the Kalman filtering.
The only critical part is to compute the inverse of F[,,t]
and the determinant of F[,,t]. If the inverse can not be
computed, the filter stops and returns the corresponding message in
status
(see Value). If the computation of the
determinant fails, the filter will continue, but the loglikelihood
(element logLik
) will be “NA”.
The inverse is computed in
two steps: First, the Cholesky factorization of F[,,t] is
calculated by dpotrf
. Second, dpotri
calculates the
inverse based on the output of dpotrf
.
The determinant of F[,,t] is computed using again the
Cholesky decomposition.
An S3object of class “fkf”, which is a list with the following elements:
att  A m * nmatrix containing the filtered state variables, i.e. att[,t] = E(alpha[t]  y[,t]). 
at  A m * (n + 1)matrix containing the predicted state variables, i.e. at[,t] = E(alpha[t]  y[,t  1]). 
Ptt  A m * m * narray containing the variance of att , i.e. Ptt[,,t] = var(alpha[t]  y[,t]). 
Pt  A m * m * (n + 1)array containing the variances of at , i.e. Pt[,,t] = var(alpha[t]  y[,t  1]). 
vt  A d * nmatrix of the prediction errors given by vt[,t] = yt[,t]  ct[,t]  Zt[,,t] %*% at[,t]. 
Ft  A d * d * narray which contains the variances of vt , i.e. Ft[,,t] = var(v[,t]). 
Kt  A m * d * narray containing the “Kalman gain” (ambiguity, see calculation above). 
logLik  The loglikelihood. 
status  A vector which contains the status of LAPACK's dpotri and dpotrf . (0, 0) means successful exit. 
sys.time  The time elapsed as an object of class “proc_time”. 
The first element of both at
and Pt
is filled with the
function arguments a0
and P0
, and the last, i.e. the (n +
1)th, element of at
and Pt
contains the predictions
at[,n + 1] = E(alpha[n + 1]  y[,n]) and
Pt[,,n + 1] = var(alpha[n + 1]  y[,n]).
David Luethi
Philipp Erb
Zurich Insurance Company
(http://www.zurich.com)
Simon Otziger
Clariden Leu
(http://www.claridenleu.com)
Maintainer: Philipp Erb <erb.philipp@gmail.com>
Harvey, Andrew C. (1990). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press.
Hamilton, James D. (1994). Time Series Analysis. Princeton University Press.
Koopman, S. J., Shephard, N., Doornik, J. A. (1999). Statistical algorithms for models in state space using SsfPack 2.2. Econometrics Journal, Royal Economic Society, vol. 2(1), pages 107160.
plot
to visualize and analyze
fkf
objects, KalmanRun
from the stats package,
function dlmFilter from package dlm
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109  ## <>
## Example 1: ARMA(2, 1) model estimation.
## <>
## This example shows how to fit an ARMA(2, 1) model using this Kalman
## filter implementation (see also stats' makeARIMA and KalmanRun).
n < 1000
## Set the AR parameters
ar1 < 0.6
ar2 < 0.2
ma1 < 0.2
sigma < sqrt(0.2)
## Sample from an ARMA(2, 1) process
a < arima.sim(model = list(ar = c(ar1, ar2), ma = ma1), n = n,
innov = rnorm(n) * sigma)
## Create a state space representation out of the four ARMA parameters
arma21ss < function(ar1, ar2, ma1, sigma) {
Tt < matrix(c(ar1, ar2, 1, 0), ncol = 2)
Zt < matrix(c(1, 0), ncol = 2)
ct < matrix(0)
dt < matrix(0, nrow = 2)
GGt < matrix(0)
H < matrix(c(1, ma1), nrow = 2) * sigma
HHt < H %*% t(H)
a0 < c(0, 0)
P0 < matrix(1e6, nrow = 2, ncol = 2)
return(list(a0 = a0, P0 = P0, ct = ct, dt = dt, Zt = Zt, Tt = Tt, GGt = GGt,
HHt = HHt))
}
## The objective function passed to 'optim'
objective < function(theta, yt) {
sp < arma21ss(theta["ar1"], theta["ar2"], theta["ma1"], theta["sigma"])
ans < fkf(a0 = sp$a0, P0 = sp$P0, dt = sp$dt, ct = sp$ct, Tt = sp$Tt,
Zt = sp$Zt, HHt = sp$HHt, GGt = sp$GGt, yt = yt)
return(ans$logLik)
}
theta < c(ar = c(0, 0), ma1 = 0, sigma = 1)
fit < optim(theta, objective, yt = rbind(a), hessian = TRUE)
fit
## Confidence intervals
rbind(fit$par  qnorm(0.975) * sqrt(diag(solve(fit$hessian))),
fit$par + qnorm(0.975) * sqrt(diag(solve(fit$hessian))))
## Filter the series with estimated parameter values
sp < arma21ss(fit$par["ar1"], fit$par["ar2"], fit$par["ma1"], fit$par["sigma"])
ans < fkf(a0 = sp$a0, P0 = sp$P0, dt = sp$dt, ct = sp$ct, Tt = sp$Tt,
Zt = sp$Zt, HHt = sp$HHt, GGt = sp$GGt, yt = rbind(a))
## Compare the prediction with the realization
plot(ans, at.idx = 1, att.idx = NA, CI = NA)
lines(a, lty = "dotted")
## Compare the filtered series with the realization
plot(ans, at.idx = NA, att.idx = 1, CI = NA)
lines(a, lty = "dotted")
## Check whether the residuals are Gaussian
plot(ans, type = "resid.qq")
## Check for linear serial dependence through 'acf'
plot(ans, type = "acf")
## <>
## Example 2: Local level model for the Nile's annual flow.
## <>
## Transition equation:
## alpha[t+1] = alpha[t] + eta[t], eta[t] ~ N(0, HHt)
## Measurement equation:
## y[t] = alpha[t] + eps[t], eps[t] ~ N(0, GGt)
y < Nile
y[c(3, 10)] < NA # NA values can be handled
## Set constant parameters:
dt < ct < matrix(0)
Zt < Tt < matrix(1)
a0 < y[1] # Estimation of the first year flow
P0 < matrix(100) # Variance of 'a0'
## Estimate parameters:
fit.fkf < optim(c(HHt = var(y, na.rm = TRUE) * .5,
GGt = var(y, na.rm = TRUE) * .5),
fn = function(par, ...)
fkf(HHt = matrix(par[1]), GGt = matrix(par[2]), ...)$logLik,
yt = rbind(y), a0 = a0, P0 = P0, dt = dt, ct = ct,
Zt = Zt, Tt = Tt, check.input = FALSE)
## Filter Nile data with estimated parameters:
fkf.obj < fkf(a0, P0, dt, ct, Tt, Zt, HHt = matrix(fit.fkf$par[1]),
GGt = matrix(fit.fkf$par[2]), yt = rbind(y))
## Compare with the stats' structural time series implementation:
fit.stats < StructTS(y, type = "level")
fit.fkf$par
fit.stats$coef
## Plot the flow data together with fitted local levels:
plot(y, main = "Nile flow")
lines(fitted(fit.stats), col = "green")
lines(ts(fkf.obj$att[1, ], start = start(y), frequency = frequency(y)), col = "blue")
legend("top", c("Nile flow data", "Local level (StructTS)", "Local level (fkf)"),
col = c("black", "green", "blue"), lty = 1)

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.