FcRandomForest: Forecasting with Random Forest

View source: R/FcRandomForest.R

FcRandomForestR Documentation

Forecasting with Random Forest

Description

Random forest for forecasting using multivariate regression as published in [Breiman, 2001].

This function was succesfully used in [Thrun et al., 2019].

Usage

FcRandomForest(Time, DF, formula=NULL,Horizon,

Package='randomForest', AutoCorrelation,NoOfTree=200,

PlotIt=TRUE,Holidays,SimilarPoints=TRUE,...)

Arguments

Time

Time [1:n] bector of objects of as.Date, if not missing all possible saisonalites are used as indicators for the predictor. formula should be set to "predictor ~ ." in order that Time is used. This idea is applied in [Thrun et al., 2019] was not published in [Breiman, 2001].

DF

Dataframe [1:n,1:d] with d variables defined by d-1 indicators and one predictor.

formula

Either a formula describing praediktors and indicators or NULL. Usually set to formula should be set to "predictor ~ ." if all d-1 indicators should be used.

Horizon

Forecast horizon as a number of days. The test set is defined by [n-Horizon,n,1:d] and the trainings set by [1:(n-Horizon),1:d].

Package

Either 'ranger' or 'randomForest'

AutoCorrelation

If not missing a name of variable stored in DF can be given, it should be the predictor and be also defined in formula. The lag of autocorrelation is defined by Horizon and the strength can be checkt by AutoCorr

NoOfTree

Number of trees to grow, [Hastie et al., 2013] suggests that random forest often stabilize aroundd 200, for big numbers there is no improvement contrary to boosting.

PlotIt

Plots MAE results, but if formula=NULL plots time series of forecasted values of test set.

Holidays

Either German Holidays are used if missing, else a data frame or vector of as.Date objects defining the holidays.

will only be used if Time given,

SimilarPoints

highly experimental, please set FALSE if you want to publish your results

...

Further parameters of random forest such as mtry, nodesize, maxnodes, or min.node.size.

Details

mtry: Number of variables randomly sampled as candidates at each split, usually d/3 or higher but lower than d

nodesize('randomForest') or min.node.size ('ranger'): Default 5, Setting this number larger causes smaller trees to be grown . Trees are grown to the maximum node size possible. [Hastie et al., 2013] to grow as large trees as possible

maxnodes('randomForest' only):Maximum number of terminal nodes trees in the forest can have,If not given, trees are grown to the maximum possible (subject to limits by nodesize). [Hastie et al., 2013] to grow as large trees as possible

if NULL than autocorralation defined by Horizon is used as predictor.

Value

List with

Forecast

Vector [1:Horizon] of predicted forecast values of the test data, names if Time given

TestDataPredictor

See also Horizon for definiton, rownames are defined by Time if given.

FeatureImportance

Importance of Features for Forecast, see importance or inpute parameter importance in ranger for details

Accuracy

Output of accuracy

Model

Output of either randomForest or ranger

TestDataIndicators

data.frame[1:(d-1),1:Horizon], in the multivariate case all variables except predictor, in the other case NULL.

See also Horizon for definiton of length, rownames are defined by Time if given.

TrainData

data.frame[1:d,1:k], see Horizon for definiton, rownames are defined by Time if given.

Note

For n=1 example of forecasting [Thrun et al.,2019] it was visible to the data scientist that even with the choice of the same parameters and data randomForest extremly outperformed ranger. The reason is unknown and this information remains unpublished.

Author(s)

Michael Thrun

References

[Breiman, 2001] Breiman, L., Random Forests, Machine Learning 45(1), 5-32, 2001.

[Hastie, 2014] Hastie, TREVOR, Tibshirani, ROBERT, Friedman, JH: The elements of statistical learning: data mining, inference, and prediction, pages 587ff, 2013.

[Thrun et al., 2019] Thrun, M., Maerte, J., Boehme, P, and Gehlert, T.: Applying Two Theorems of Machine Learning to the Forecasting of Biweekly Arrivals at a Call Center, Proceedings of ECDA, accepted, Bayreuth, 2019.

See Also

randomForest, ranger

Examples

##ToDo

Mthrun/TSAT documentation built on Feb. 5, 2024, 11:15 p.m.