Description Details References Examples
Package KFAS contains functions for Kalman filtering, smoothing and simulation of linear state space models with exact diffuse initialization.
The linear Gaussian state space model is given by
y[t] = Z[t]α[t] + ε[t], (observation equation)
α[t+1] = T[t]α[t] + R[t]η[t], (transition equation)
where ε[t] ~ N(0,H[t]), η[t] ~ N(0,Q[t]) and α[1] ~ N(a[1],P[1]) independently of each other.
All system and covariance matrices Z, H, T, R and Q can be time-varying, and partially or totally missing observations y[t] are allowed.
Covariance matrices H and Q has to be positive semidefinite.
Dimensions of system matrices are
Z | p*m*1 or p*m*n in time varying case |
H | p*p*1 or p*p*n in time varying case |
T | m*m*1 or m*m*n in time varying case |
R | m*k*1 or m*k*n in time varying case |
Q | k*k*1 or k*k*n in time varying case |
In case of non-Gaussian observations, the observation equation is of form
p(y[t]|θ[t]) = p(y[t]|Z[t]α[t]),
with p(y[t]|θ[t]) being one of the following:
If observations y[t] are Poisson distributed, parameter of Poisson distribution is u[t]λ[t] and θ[t]=log(λ[t]), where u[t] is so called offset term.
If observations eqny_ty[t] are from binomial distribution, u is a vector specifying number the of trials at times 1,…,n, and θ[t] = log(π[t]/(1-π[t])), where π[t] is the probability of success at time t.
For non-Gaussian models u[t]=1 as a default. For Gaussian models, parameter is omitted.
Only univariate observations are supported when observation equation is non-Gaussian.
For the unknown elements of initial state vector a[1], KFS uses exact diffuse initialization by Koopman and Durbin (2000, 2001, 2003), where the unknown initial states are set to have a zero mean and infinite variance, so
P[1] = P[*,1] + κP[inf,1],
with κ going to infinity and P[inf,1] being diagonal matrix with ones on diagonal elements corresponding to unknown initial states.
Diffuse phase is continued until rank of P[inf,t] becomes zero. Rank of P[inf] decreases by 1, if F[inf]>tolF>0. Usually the number of diffuse time points equals the number unknown elements of initial state vector, but missing observations or time-varying Z can affect this. See Koopman and Durbin (2000, 2001, 2003) for details for exact diffuse and non-diffuse filtering.
To lessen the notation and storage space, KFAS uses letters P, F and K for non-diffuse part of the corresponding matrices, omitting the asterisk in diffuse phase.
All functions of KFAS use the univariate approach (also
known as sequential processing, see Anderson and Moore
(1979)) which is from Koopman and Durbin (2000, 2001). In
univariate approach the observations are introduced one
element at the time. Therefore the prediction error
variance matrices F and Finf does not need to be
non-singular, as there is no matrix inversions in
univariate approach algorithm. This provides more stable
and possibly more faster filtering and smoothing than
normal multivariate Kalman filter algorithm. If
covariance matrix H is not diagonal, it is possible to
transform the model by either using LDL decomposition on
H, or augmenting the state vector with ε
disturbances. See transformSSM
for more
details.
Koopman, S.J. and Durbin J. (2000). Fast filtering and smoothing for non-stationary time series models, Journal of American Statistical Assosiation, 92, 1630-38.
Koopman, S.J. and Durbin J. (2001). Time Series Analysis by State Space Methods. Oxford: Oxford University Press.
Koopman, S.J. and Durbin J. (2003). Filtering and
smoothing of state vector for diffuse state space models,
Journal of Time Series Analysis, Vol. 24, No. 1. #'
Shumway, Robert H. and Stoffer, David S. (2006). Time
Series Analysis and Its Applications: With R examples.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 | library(KFAS)
# Example of local level model for Nile series
y<-Nile
modelNile<-structSSM(y=y)
fit<-fitSSM(inits=c(0.5*log(var(Nile)),0.5*log(var(Nile))),model=modelNile)
# Filtering and state smoothing
kfsNile<-KFS(fit$model,smoothing="state")
# Simple plot of series and the smoothed signal = Z*alphahat
plot(kfsNile,col=1:2)
# Confidence intervals for the state
lows<-c(kfsNile$alphahat-qnorm(0.95)*sqrt(c(kfsNile$V)))
ups<-c(kfsNile$alphahat+qnorm(0.95)*sqrt(c(kfsNile$V)))
plot.ts(cbind(y,c(kfsNile$alphahat),lows,ups), plot.type="single", col=c(1:2,3,3),
ylab="Predicted Annual flow", main="River Nile")
# Missing observations, using same parameter estimates
y<-Nile
y[c(21:40,61:80)]<-NA
modelNile<-structSSM(y=y,H=fit$model$H,Q.level=fit$model$Q)
kfsNile<-KFS(modelNile,smoothing="state")
# Filtered state
plot.ts(cbind(y,c(NA,kfsNile$a[,-c(1,101)])), plot.type="single", col=c(1:2,3,3),
ylab="Predicted Annual flow", main="River Nile")
# Smoothed state
plot.ts(cbind(y,c(kfsNile$alp)), plot.type="single", col=c(1:2,3,3),
ylab="Predicted Annual flow", main="River Nile")
# Prediction of missing observations
predictNile<-predict(kfsNile)
lows<-predictNile$y-qnorm(0.95)*sqrt(c(predictNile$F))
ups<-predictNile$y+qnorm(0.95)*sqrt(c(predictNile$F))
plot.ts(cbind(y,predictNile$y,lows,ups), plot.type="single", col=c(1:2,4,4),
ylab="Predicted Annual flow", main="River Nile")
# Example of multivariate local level model with only one state
# Two series of average global temperature deviations for years 1880-1987
# See Shumway and Stoffer (2006), p. 327 for details
data(GlobalTemp)
modelTemp<-SSModel(y=GlobalTemp, Z = matrix(1,nrow=2), T=1, R=1, H=matrix(NA,2,2),
Q=NA, a1=0, P1=0, P1inf=1)
# Estimating the variance parameters
fit<-fitSSM(inits=c(0.5*log(.1),0.5*log(.1),0.5*log(.1),0),model=modelTemp)
outTemp<-KFS(fit$model,smooth="both")
ts.plot(cbind(modelTemp$y,t(outTemp$alphahat)),col=1:3)
legend("bottomright",legend=c(colnames(GlobalTemp), "Smoothed signal"), col=1:3, lty=1)
# Example of multivariate series where first series follows stationary ARMA(1,1)
# process and second stationary AR(1) process.
y1<-arima.sim(model=list(ar=0.8,ma=0.3), n=100, sd=0.5)
model1<-arimaSSM(y=y1,arima=list(ar=0.8,ma=0.3),Q=0.5^2)
y2<-arima.sim(model=list(ar=-0.5), n=100)
model2<-arimaSSM(y=y2,arima=list(ar=-0.5))
model<-model1+model2
# Another way:
modelb<-arimaSSM(y=ts.union(y1,y2),arima=list(list(ar=0.8,ma=0.3),list(ar=-0.5)),Q=diag(c(0.5^2,1)))
f.out<-KFS(model)
# Drivers
model<-structSSM(y=log(Seatbelts[,"drivers"]),trend="level",seasonal="time",
X=cbind(log(Seatbelts[,"kms"]),log(Seatbelts[,"PetrolPrice"]),Seatbelts[,c("law")]))
fit<-fitSSM(inits=rep(-1,3),model=model)
out<-KFS(fit$model,smoothing="state")
plot(out,lty=1:2,col=1:2,main="Observations and smoothed signal with and without seasonal component")
lines(signal(out,states=c(1,13:15))$s,col=4,lty=1)
legend("bottomleft",legend=c("Observations", "Smoothed signal","Smoothed level"), col=c(1,2,4), lty=c(1,2,1))
# Multivariate model with constant seasonal pattern in frequency domain
model<-structSSM(y=log(Seatbelts[,c("front","rear")]),trend="level",seasonal="freq",
X=cbind(log(Seatbelts[,c("kms")]),log(Seatbelts[,"PetrolPrice"]),Seatbelts[,"law"]),
H=NA,Q.level=NA,Q.seasonal=0)
sbFit<-fitSSM(inits=rep(-1,6),model=model)
out<-KFS(sbFit$model,smoothing="state")
ts.plot(signal(out,states=c(1:2,25:30))$s,model$y,col=1:4)
# Poisson model
model<-structSSM(y=Seatbelts[,"VanKilled"],trend="level",seasonal="time",X=Seatbelts[,"law"],
distribution="Poisson")
# Estimate variance parameters
fit<-fitSSM(inits=rep(0.5*log(0.005),2), model=model)
model<-fit$model
# Approximating model, gives also the conditional mode of theta
amod<-approxSSM(model)
out.amod<-KFS(amod,smoothing="state")
# State smoothing via importance sampling
out<-KFS(model,nsim=1000)
# Observations with exp(smoothed signal) computed by
# importance sampling in KFS, and by approximating model
ts.plot(cbind(model$y,out$yhat,exp(amod$theta)),col=1:3) # Almost identical
# It is more interesting to look at the smoothed values of exp(level + intervention)
lev1<-exp(signal(out,states=c(1,13))$s)
lev2<-exp(signal(out.amod,states=c(1,13))$s)
# These are slightly biased as E[exp(x)] > exp(E[x]), better to use importance sampling:
vansample<-importanceSSM(model,save.model=FALSE,nsim=250)
# nsim is number of independent samples, as default two antithetic variables are used,
# so total number of samples is 1000.
w<-vansample$weights/sum(vansample$weights)
level<-colSums(t(exp(vansample$states[1,,]+model$Z[1,13,]*vansample$states[13,,]))*w)
ts.plot(cbind(model$y,lev1,lev2,level),col=1:4) #' Almost identical results
# Confidence intervals (no seasonal component)
varlevel<-colSums(t(exp(vansample$states[1,,]+model$Z[1,13,]*vansample$states[13,,])^2)*w)-level^2
intv<-level + qnorm(0.975)*varlevel%o%c(-1,1)
ts.plot(cbind(model$y,level,intv),col=c(1,2,3,3))
# Simulation error
# Mean estimation error of the single draw with 2 antithetics
level2<-t(exp(vansample$states[1,,1:250]+model$Z[1,13,]*vansample$states[13,,1:250])-level)*w[1:250]+
t(exp(vansample$states[1,,251:500]+model$Z[1,13,]*vansample$states[13,,251:500])-level)*w[251:500]+
t(exp(vansample$states[1,,501:750]+model$Z[1,13,]*vansample$states[13,,501:750])-level)*w[501:750]+
t(exp(vansample$states[1,,751:1000]+model$Z[1,13,]*vansample$states[13,,751:1000])-level)*w[751:1000]
varsim<-colSums(level2^2)
ts.plot(sqrt(varsim/varlevel)*100)
# Same without antithetic variables
vansamplenat<-importanceSSM(model,save.model=FALSE,nsim=1000,antithetics=FALSE)
w<-vansamplenat$weights/sum(vansamplenat$weights)
levelnat<-colSums(t(exp(vansamplenat$states[1,,]+
model$Z[1,13,]*vansamplenat$states[13,,]))*w)
varsimnat<-colSums((t(exp(vansamplenat$states[1,,]+
model$Z[1,13,]*vansamplenat$states[13,,])-levelnat)*w)^2)
varlevelnat<-colSums(t(exp(vansamplenat$states[1,,]+
model$Z[1,13,]*vansamplenat$states[13,,])^2)*w)-levelnat^2
ts.plot(sqrt(varsimnat/varlevelnat)*100)
ts.plot((sqrt(varsimnat)-sqrt(varsim))/sqrt(varsimnat)*100)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.