#' Automatic ARIMA fitting and prediction with Kalman filter
#'
#' The function predicts and returns the next n consecutive values of a
#' univariate time series using the best evaluated ARIMA model automatically
#' fitted with Kalman filter. It also evaluates the fitness of the produced
#' model, using AICc, AIC, BIC and logLik criteria, and its prediction
#' accuracy, using the MSE, NMSE, MAPE, sMAPE and maximal error accuracy
#' measures.
#'
#' A best ARIMA model is automatically fitted by the \code{\link{auto.arima}}
#' function in the \code{forecast} package. The coefficients of this model are
#' then used as initial parameters for optimization of a state space model
#' (\code{\link{SSModel}}) using the Kalman filter and functions of the
#' \code{KFAS} package (see \code{\link{SSMarima}} and
#' \code{\link[KFAS]{artransform}}).
#'
#' If \code{initQ} is \code{NULL}, it is automatically set as either
#' \code{log(var(timeseries))} or \code{0}. For that, a set of candidate ARIMA
#' state space models is generated by different initial parameterization of
#' \code{initQ} during the model optimization process. The value option which
#' generates the best ranked candidate ARIMA model acoording to the criteria in
#' \code{rank.by} is selected.
#'
#' The ranking criteria in \code{rank.by} may be set as a prediction error
#' measure (such as \code{\link{MSE}}, \code{\link{NMSE}}, \code{\link{MAPE}},
#' \code{\link{sMAPE}} or \code{\link{MAXError}}), or as a fitness criteria
#' (such as \code{\link{AIC}}, \code{\link{AICc}}, \code{\link{BIC}} or
#' \code{\link{logLik}}). In the former case, the candidate models are used for
#' time series prediction and the error measures are calculated by means of a
#' cross-validation process. In the latter case, the candidate models are
#' fitted and fitness criteria are calculated based on all observations in
#' \code{timeseries}.
#'
#' If \code{rank.by} is set as \code{"errors"} or \code{"fitness"}, the
#' candidate models are ranked by all the mentioned prediction error measures
#' or fitness criteria, respectively. The wheight of the ranking criteria is
#' equally distributed. In this case, a \code{rank.position.sum} criterion is
#' produced for ranking the candidate models. The \code{rank.position.sum}
#' criterion is calculated as the sum of the rank positions of a model (1 = 1st
#' position = better ranked model, 2 = 2nd position, etc.) on each calculated
#' ranking criteria.
#'
#' @param timeseries A vector or univariate time series which contains the
#' values used for fitting an ARIMA model with Kalman filter.
#' @param timeseries.test A vector or univariate time series containing a
#' continuation for \code{timeseries} with actual values. It is used as a
#' testing set and base for calculation of prediction error measures. Ignored
#' if \code{NULL}.
#' @param h Number of consecutive values of the time series to be predicted. If
#' \code{h} is \code{NULL}, the number of consecutive values to be predicted is
#' assumed to be equal to the length of \code{timeseries.test}. Required when
#' \code{timeseries.test} is \code{NULL}.
#' @param na.action A function for treating missing values in \code{timeseries}
#' and \code{timeseries.test}. The default function is \code{\link[stats]{na.omit}},
#' which omits any missing values found in \code{timeseries} or
#' \code{timeseries.test}.
#' @param level Confidence level for prediction intervals. See the
#' \code{\link{predict.SSModel}} function in the \code{KFAS} package.
#' @param filtered If \code{filtered} is \code{TRUE}, Kalman filtered time
#' series observations are used for prediction, otherwise, Kalman smoothed
#' observations are used for prediction.
#' @param initQ Numeric argument regarding the initial value for the covariance
#' of disturbances parameter to be optimized over. The initial value to be
#' optimized is set to \code{exp(initQ)}. See the \code{Q} argument of the
#' \code{\link{SSMarima}} function in the \code{KFAS} package and the examples
#' in \code{\link{KFAS}}. If \code{NULL}, \code{initQ} is automatically set.
#' See 'Details'.
#' @param rank.by Character string. Criteria used for ranking candidate models
#' generated using different options of values for \code{initQ}. Only used if
#' \code{initQ} is \code{NULL}. Ignored otherwise. See 'Details'.
#' @param ... Additional arguments passed to the \code{\link{auto.arima}}
#' modelling function.
#' @return A list with components: \item{model}{An object of class "SSModel"
#' containing the best evaluated ARIMA model fitted with Kalman Filter.}
#' \item{initQ}{The initQ argument provided (or automatically selected) for
#' optimization of the best evaluated ARIMA model fitted with Kalman Filter.}
#' \item{AICc}{Numeric value of the computed AICc criterion of the best
#' evaluated model.} \item{AIC}{Numeric value of the computed AIC criterion of
#' the best evaluated model.} \item{BIC}{Numeric value of the computed BIC
#' criterion of the best evaluated model.} \item{logLik}{Numeric value of the
#' computed log-likelihood of the best evaluated model.} \item{pred}{A list
#' with the components \code{mean}, \code{lower} and \code{upper}, containing
#' the predictions of the best evaluated model and the lower and upper limits
#' for prediction intervals, respectively. All components are time series. See
#' \code{\link{predict.SSModel}}.} \item{MSE}{Numeric value of the resulting
#' MSE error of prediction. Require \code{timeseries.test}.}
#' \item{NMSE}{Numeric value of the resulting NMSE error of prediction. Require
#' \code{timeseries.test}.} \item{MAPE}{Numeric value of the resulting MAPE
#' error of prediction. Require \code{timeseries.test}.} \item{sMAPE}{Numeric
#' value of the resulting sMAPE error of prediction. Require
#' \code{timeseries.test}.} \item{MaxError}{Numeric value of the maximal error
#' of prediction. Require \code{timeseries.test}.} \item{rank.val}{Data.frame
#' with the fitness or prediction accuracy criteria computed for all candidate
#' ARIMA with Kalman filter models ranked by \code{rank.by}. It has the
#' attribute \code{"ranked.models"}, which is a list of objects of class
#' "SSModel" containing all the candidate ARIMA models fitted with Kalman
#' Filter, also ranked by \code{rank.by}. Only provided if \code{initQ} was
#' automatically selected.} \item{rank.by}{Ranking criteria used for ranking
#' candidate models and producing \code{rank.val}.}
#' @author Rebecca Pontes Salles
#' @seealso \code{\link{fittestArima}}, \code{\link{fittestLM}},
#' \code{\link{marimapred}}
#' @references R.J. Hyndman and G. Athanasopoulos, 2013, Forecasting:
#' principles and practice. OTexts.
#'
#' R.H. Shumway and D.S. Stoffer, 2010, Time Series Analysis and Its
#' Applications: With R Examples. 3rd ed. 2011 edition ed. New York, Springer.
#' @keywords ARIMA automatic fitting Kalman filter adjustment prediction
#' evaluation criterion errors
#' @examples
#'
#' \donttest{
#' data(CATS,CATS.cont)
#' fArimaKF <- fittestArimaKF(CATS[,2],CATS.cont[,2])
#' #predicted values
#' pred <- fArimaKF$pred
#'
#' #extracting Kalman filtered and smoothed time series from the best fitted model
#' fs <- KFAS::KFS(fArimaKF$model,filtering=c("state","mean"),smoothing=c("state","mean"))
#' f <- fitted(fs, filtered = TRUE) #Kalman filtered time series
#' s <- fitted(fs) #Kalman smoothed time series
#' #plotting the time series data
#' plot(c(CATS[,2],CATS.cont[,2]),type='o',lwd=2,xlim=c(960,1000),ylim=c(200,600),
#' xlab="Time",ylab="ARIMAKF")
#' #plotting the Kalman filtered time series
#' lines(f,col='red',lty=2,lwd=2)
#' #plotting the Kalman smoothed time series
#' lines(s,col='green',lty=2,lwd=2)
#' #plotting predicted values
#' lines(ts(pred$mean,start=981),lwd=2,col='blue')
#' #plotting prediction intervals
#' lines(ts(pred$upper,start=981),lwd=2,col='light blue')
#' lines(ts(pred$lower,start=981),lwd=2,col='light blue')
#' }
#'
#' @export fittestArimaKF
fittestArimaKF <-
function(timeseries, timeseries.test=NULL, h=NULL, na.action=stats::na.omit, level=0.9, filtered = TRUE, initQ=NULL,
rank.by=c("MSE","NMSE","MAPE","sMAPE","MaxError","AIC","AICc","BIC","logLik","errors","fitness"), ...){
if(is.null(timeseries)) stop("timeseries is required and must have positive length")
if(is.null(timeseries.test) & is.null(h)) stop("the number of values to be predicted is unknown, provide either timeseries.test or h")
#require(KFAS)
#preparing the training time series
ts <- ts(na.action(timeseries))
nobs <- length(ts)
i.n.ahead <- nobs+1
#preparing the test time series (if present) and setting the prediction horizon
n.ahead <- ts.test <- NULL
if(!is.null(timeseries.test)) {
ts.test <- ts(na.action(timeseries.test),start=i.n.ahead)
n.ahead <- length(ts.test)
if(!is.null(h)){
if(h < n.ahead){
ts.test <- utils::head(ts.test,h)
n.ahead <- h
}
}
}
else n.ahead <- h
# evaluate choices of rank.by
rank.by <- match.arg(rank.by)
if(rank.by == "fitness") rank.by <- c("AIC","AICc","BIC","logLik")
else if(rank.by == "errors") rank.by <- c("MSE","NMSE","MAPE","sMAPE","MaxError")
# function to return the log-likelihood of an ARIMA State Space Model given a set of parameters
# used by the optim function to optimize the choice of model parameters
# may also return the model if estimate is FALSE
likfn <- function(pars, model, p, q, d, estimate=TRUE){
tmp <- try(KFAS::SSMarima(ar=KFAS::artransform(pars[1:p]),
ma=KFAS::artransform(pars[(p+1):(p+q)]),d=d,Q = exp(pars[(p+q+1)])),silent=TRUE)
if(!inherits(tmp,"try-error")){
model["T","arima"] <- tmp$T
model["R","arima"] <- tmp$R
model["P1","arima"] <- tmp$P1
model["Q","arima"] <- tmp$Q
if(estimate){
-stats::logLik(model)
} else model
} else {
if(estimate){
1e100
} else model
}
}
#function for choosing initial arima parameters based on the result of auto.arima
arima.par <- function(timeseries,...){
#Best fit ARIMA
fitARIMA <- forecast::auto.arima(timeseries,...)
#Sets arima parameters from fitARIMA
ar.coef <- fitARIMA$coef[ grep("ar",attr(fitARIMA$coef, "names")) ]
ma.coef <- fitARIMA$coef[ grep("ma",attr(fitARIMA$coef, "names")) ]
p <- length(ar.coef)
q <- length(ma.coef)
npar <- (p+q+1)
d <- fitARIMA$arma[6]
if(d==0) npar <- npar+1
return(list(ar.coef=ar.coef,ma.coef=ma.coef,p=p,q=q,d=d,npar=npar))
}
#optimize ARIMA State Space Model given a set of initial parameters
optim.model <- function(timeseries, initPar, initQ, likfn){
#generates initial ARIMA State Space Model
SSMarima <- KFAS::SSMarima
model <- KFAS::SSModel(timeseries ~ SSMarima(ar=initPar$ar.coef,ma=initPar$ma.coef,d=initPar$d), H=0)
#optimizes the model parameters based on initial values
inits <- c(initPar$ar.coef,initPar$ma.coef,initQ)
fit <- stats::optim(par=inits, fn=likfn, model=model, p=initPar$p, q=initPar$q, d=initPar$d, method='BFGS')
model_arima <- likfn(pars=fit$par,model=model, p=initPar$p, q=initPar$q, d=initPar$d, estimate=FALSE)
return(model_arima)
}
#computes quality measures acoording to rank.by
fitness.criteria <- function(model,npar,nobs){
#computes quality measures acoording to rank.by
ll <- stats::logLik(model, marginal = TRUE)
AIC <- -2*ll+2*npar
BIC <- -2*ll+log(nobs)*npar
AICc <- AIC + 2*npar*(npar+1)/(nobs-npar-1)
return(data.frame(AICc=AICc,AIC=AIC,BIC=BIC,logLik=ll))
}
pred.criteria <- function(model,n.ahead,level,filtered,i.n.ahead,ts.test,ts){
#computes predictions using the candidate model
pred <- stats::predict(model,n.ahead=n.ahead,interval="prediction",level=level, filtered = filtered)
pred <- list(mean=pred[,1],lower=pred[,2],upper=pred[,3])
pred.mean <- ts(pred$mean,start=i.n.ahead)
#computes prediction error measures if ts.test is provided
if(!is.null(ts.test)) {
MSE <- TSPred::MSE(ts.test, pred.mean)
NMSE <- TSPred::NMSE(ts.test, pred.mean, ts)
MAPE <- TSPred::MAPE(ts.test, pred.mean)
sMAPE <- TSPred::sMAPE(ts.test, pred.mean)
MaxError <- TSPred::MAXError(ts.test, pred.mean)
return(list(pred=pred,errors=data.frame(MSE=MSE,NMSE=NMSE,MAPE=MAPE,sMAPE=sMAPE,MaxError=MaxError)))
}
return(list(pred=pred))
}
ranking.models <- function(rank,rank.by,models){
rownames(rank) <- NULL
#create ranking criteria based on all measures referenced by rank.by
criteria <- rank[ , (names(rank) %in% rank.by), drop = FALSE]
if("logLik" %in% names(criteria)) criteria["logLik"] <- -criteria["logLik"]
TSPredC <- 0
for(c in names(criteria)) TSPredC <- TSPredC + rank(criteria[c])
names(TSPredC) <- NULL
#ranking the candidate models based on all measures referenced by rank.by
rank <- cbind(rank,rank.position.sum=TSPredC)
rank <- rank[with(rank,order(rank.position.sum)),]
#candidate models are ranked and included as attribute of rank dataframe
models <- models[rank$ModelId]
attr(rank,"ranked.models") <- models
return(rank)
}
#if initQ parameter is not provided, the function finds the best option among {log(var(ts)),0}
rank <- NULL
if(is.null(initQ)){
# creates the Validation series for parameter optimization
ts.val <- utils::tail(ts,n.ahead)
ts.tmp <- utils::head(ts,nobs-n.ahead)
#if rank.by considers fitness measures, parameter optimization uses only and all the training series
if(any(c("AIC","AICc","BIC","logLik") %in% rank.by)) ts.tmp <- ts
#initial options of Q
initQ.opt <- c(log(stats::var(ts.tmp)),0)
#initial arima parameters
initPar <- arima.par(ts.tmp,...)
#produces candidate models and measures in order to select "best" initQ parameter
models <- list()
for(initQ in initQ.opt){
#generates and optimizes candidate ARIMA State Space Model based on initial parameter values
model_arima <- optim.model(ts.tmp, initPar, initQ, likfn)
#creates candidate model id and saves it in the list models
ModelId <- paste("ARIMAKF",paste("initQ:",round(initQ,digits=1)),sep="_")
models[[ModelId]] <- model_arima
if(any(c("AIC","AICc","BIC","logLik") %in% rank.by)){
#computes fitness measures and returns a dataframe with them
rank.measures <- fitness.criteria(model_arima,initPar$npar,length(ts.tmp))
}
else if(any(c("MSE","NMSE","MAPE","sMAPE","MaxError") %in% rank.by)){
#computes predictions and prediction error measures
rank.measures <- pred.criteria(model_arima,n.ahead,level,filtered,length(ts.tmp)+1,ts.val,ts.tmp)$errors
}
#combine results of the candidate models in the dataframe rank
rank <- rbind(rank, data.frame(ModelId=ModelId,initQ=initQ,rank.measures))
}
#ranking the candidate models based on all measures referenced by rank.by
#also candidate models (objects) are ranked and included as attribute of rank dataframe
rank <- ranking.models(rank,rank.by,models)
#if rank.by is fitness
initQ.optim <- rank[1,]$initQ
}
else{
initQ.optim <- initQ
}
#initial arima parameters
initPar <- arima.par(ts,...)
#gets previously optimized Model based on optim parameter values
#if a ranking based on fitness measures was performed, the whole time series was used for training and the model generated can be reused
if(any(c("AIC","AICc","BIC","logLik") %in% rank.by) & !is.null(rank)){
model_arima <- attr(rank,"ranked.models")[[1]]
}
#generates and optimizes Model based on optim parameter values
else{
model_arima <- optim.model(ts, initPar, initQ.optim, likfn)
}
#computes fitness measures and returns a dataframe with them
fit.measures <- fitness.criteria(model_arima,initPar$npar,nobs)
fit.measures <- lapply(fit.measures,identity) #transforms to list
#computes predictions, and prediction error measures (if timeseries.test is provided)
pred.measures <- pred.criteria(model_arima,n.ahead,level,filtered,i.n.ahead,ts.test,ts)
#predictions
prediction <- pred.measures$pred
#error measures into list
errors.measures <- switch(is.null(pred.measures$errors)+1,lapply(pred.measures$errors,identity),NULL)
#append results in a list
results <- c( list(model=model_arima), initQ=initQ.optim, fit.measures, list(pred=prediction), errors.measures )
if(!is.null(rank) ) results <- c(results, list(rank.val=rank), rank.by=rank.by)
return(results)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.