ro: Rolling Origin
In greybox: Toolbox for Model Building and Forecasting

View source: R/ro.R

ro	R Documentation

Rolling Origin

Description

The function does rolling origin for any forecasting function

Usage

ro(data, h = 10, origins = 10, call, value = NULL, step = 1,
  ci = FALSE, co = TRUE, silent = TRUE, parallel = FALSE, ...)

Arguments

`data`	Data vector or ts object with the response variable passed to the function.
`h`	The forecasting horizon.
`origins`	The number of rolling origins.
`call`	The call that is passed to the function. The call must be in quotes. Example: `"forecast(ets(data),h)"`. Here `data` shows where the data is and `h` defines where the horizon should be passed in the `call`. Some hidden parameters can also be specified in the call. For example, parameters `counti`, `counto` and `countf` are used in the inner loop and can be used for the regulation of exogenous variables sizes. See examples for the details.
`value`	The variable or set of variables returned by the `call`. For example, `mean` for functions of `forecast` package. This can also be a vector of variables. See examples for the details. If the parameter is `NULL`, then all the values from the call are returned (could be really messy!). Note that if your function returns a list with matrices, then ro will return an array. If your function returns a list, then you will have a list of lists in the end. So it makes sense to understand what you want to get before running the function.
`step`	Number of observations to skip between origins when moving to the next one. For example, could be useful if forecast should be produced every week on Sunday, in which case, for daily data, `step=7`.
`ci`	The parameter defines if the in-sample window size should be constant. If `TRUE`, then with each origin one observation is added at the end of series and another one is removed from the beginning.
`co`	The parameter defines whether the holdout sample window size should be constant. If `TRUE`, the rolling origin will stop when less than `h` observations are left in the holdout.
`silent`	If `TRUE`, nothing is printed out in the console.
`parallel`	If `TRUE`, then the model fitting is done in parallel. WARNING! Packages `foreach` and either `doMC` (Linux and Mac only) or `doParallel` are needed in order to run the function in parallel.
`...`	This is temporary and is needed in order to capture "silent" parameter if it is provided.

Details

This function produces rolling origin forecasts using the data and a call passed as parameters. The function can do all of that either in serial or in parallel, but it needs foreach and either doMC (Linux only) or doParallel packages installed in order to do the latter.

This is a dangerous function, so be careful with the call that you pass to it, and make sure that it is well formulated before the execution. Also, do not forget to provide the value that needs to be returned or you might end up with very messy results.

For more details and more examples of usage, please see vignette for the function. In order to do that, just run the command: vignette("ro","greybox")

Value

Function returns the following variables:

actuals - the data provided to the function.
holdout - the matrix of actual values corresponding to the produced forecasts from each origin.
value - the matrices / array / lists with the produced data from each origin. Name of each object corresponds to the names in the parameter value.

Author(s)

Yves Sagaert

Ivan Svetunkov, ivan@svetunkov.com

References

Tashman, (2000) Out-of-sample tests of forecasting accuracy: an analysis and review International Journal of Forecasting, 16, pp. 437-450. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/S0169-2070(00)00065-0")}.

Examples


y <- rnorm(100,0,1)
ourCall <- "predict(arima(x=data,order=c(0,1,1)),n.ahead=h)"
# NOTE that the "data" needs to be used in the call, not "y".
# This way we tell the function, where "y" should be used in the call of the function.

# The default call and values
ourValue <- "pred"
ourRO <- ro(y, h=5, origins=5, ourCall, ourValue)

# We can now plot the results of this evaluation:
plot(ourRO)

# You can also use dollar sign
ourValue <- "$pred"
# And you can have constant in-sample size
ro(y, h=5, origins=5, ourCall, ourValue, ci=TRUE)

# You can ask for several values
ourValue <- c("pred","se")
# And you can have constant holdout size
ro(y, h=5, origins=20, ourCall, ourValue, ci=TRUE, co=TRUE)

#### The following code will give exactly the same result as above,
#### but computed in parallel using all but 1 core of CPU:
## Not run: ro(y, h=5, origins=20, ourCall, ourValue, ci=TRUE, co=TRUE, parallel=TRUE)

#### If you want to use functions from forecast package, please note that you need to
#### set the values that need to be returned explicitly. There are two options for this.
# Example 1:
## Not run: ourCall <- "forecast(ets(data), h=h, level=95)"
ourValue <- c("mean", "lower", "upper")
ro(y,h=5,origins=5,ourCall,ourValue)
## End(Not run)

# Example 2:
## Not run: ourCall <- "forecast(ets(data), h=h, level=c(80,95))"
ourValue <- c("mean", "lower[,1]", "upper[,1]", "lower[,2]", "upper[,2]")
ro(y,h=5,origins=5,ourCall,ourValue)
## End(Not run)

#### A more complicated example using the for loop and
#### several time series
x <- matrix(rnorm(120*3,0,1), 120, 3)

## Form an array for the forecasts we will produce
## We will have 4 origins with 6-steps ahead forecasts
ourForecasts <- array(NA,c(6,4,3))

## Define models that need to be used for each series
ourModels <- list(c(0,1,1), c(0,0,1), c(0,1,0))

## This call uses specific models for each time series
ourCall <- "predict(arima(data, order=ourModels[[i]]), n.ahead=h)"
ourValue <- "pred"

## Start the loop. The important thing here is to use the same variable 'i' as in ourCall.
for(i in 1:3){
    ourData <- x[,i]
    ourForecasts[,,i] <- ro(data=ourData,h=6,origins=4,call=ourCall,
                            value=ourValue,co=TRUE,silent=TRUE)$pred
}

## ourForecasts array now contains rolling origin forecasts from specific
## models.

##### An example with exogenous variables
x <- rnorm(100,0,1)
xreg <- matrix(rnorm(200,0,1),100,2,dimnames=list(NULL,c("x1","x2")))

## 'counti' is used to define in-sample size of xreg,
## 'counto' - the size of the holdout sample of xreg

ourCall <- "predict(arima(x=data, order=c(0,1,1), xreg=xreg[counti,,drop=FALSE]),
            n.ahead=h, newxreg=xreg[counto,,drop=FALSE])"
ourValue <- "pred"
ro(x,h=5,origins=5,ourCall,ourValue)

##### Poisson regression with alm
x <- rpois(100,2)
xreg <- cbind(x,matrix(rnorm(200,0,1),100,2,dimnames=list(NULL,c("x1","x2"))))
ourCall <- "predict(alm(x~., data=xreg[counti,,drop=FALSE], distribution='dpois'),
                    newdata=xreg[counto,,drop=FALSE])"
ourValue <- "mean"
testRO <- ro(xreg[,1],h=5,origins=5,ourCall,ourValue,co=TRUE)
plot(testRO)

## 'countf' is used to take xreg of the size corresponding to the whole
## sample on each iteration
## This is useful when working with functions from smooth package.
## The following call will return the forecasts from es() function of smooth.
## Not run: ourCall <- "es(data=data, h=h, xreg=xreg[countf,,drop=FALSE])"
ourValue <- "forecast"
ro(x,h=5,origins=5,ourCall,ourValue)
## End(Not run)

greybox documentation built on April 12, 2025, 1:25 a.m.