tbma: Tree-based Moving Average

Description Usage Arguments Value Examples

View source: R/tbma.R

Description

A new tree-based ensemble model that is called tree-based moving average (TBMA) is provided for time series forecasting problems. The TBMA model provides point and probabilistic forecasts and uses both of the tree-based ensemble and MA approaches to consider predictors and time series components. With the use of the tree-based ensemble and MA approaches, the TBMA model can handle a large number of numerical and categorical predictors without sacrificing the accuracy and capture autocorrelation between time series observations.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
tbma(
  formula,
  train,
  test,
  prediction_type = "point",
  percentile = c(0.25, 0.5, 0.75),
  group_id = NULL,
  horizon = nrow(train),
  splitrule = "extratrees",
  always_split_variables = NULL,
  min_node_size = 5,
  max_depth = NULL,
  num_trees = 100,
  ma_order = 2,
  mtry = round(sqrt(ncol(train)))
)

Arguments

formula

Object of class formula

train

A data.table object

test

A data.table object

prediction_type

Prediction type can be either "point" or "probabilistic". In case of "probabilistic", percentiel parameter is required.

percentile

Percentile of the probabilistic forecasts if the prediction type is "probabilistic". Percentile paramater can take multiple values between 0 and 1 in a vector.

group_id

Gorup identity parameter is required to filter the data that is going to be used for prediction of a test observations. Group identity parameter is optional to use and usually one of the categorical variables has significant effect on the response variable.

horizon

Horizon parameter filters the train data that is going to be used for forecasting a test observations. The last n train observation is used for forecasting in case of horizon is n. Default value is number of observations in the train set which means no filtering.

splitrule

Splitrule determines the process of splitting. The parameter, which is based on ranger() function in ranger package, can be "extratrees","variance", or "maxstat". See the documentation of the ranger fuction in ranger package for details.

always_split_variables

Vector of column names indicating the colums that should be selected as candidate variables for splitting. See the documentation of the ranger fuction in ranger package for details.

min_node_size

Minimum node size allowed in terminal nodes of decesion trees.

max_depth

Maximum depth of decision trees. See the documentation of the ranger fuction in ranger package for details.

num_trees

Number of trees

ma_order

Order of the movinh average part of the TBMA model. Default is 2. High order parameter can lead NA forecasts.

mtry

Number of variables selected as candidate varibles for splitting. See the documentation of the ranger fuction in ranger package for details.

Value

A data.table object. In case of point forecasting, a column called "prediction" is added to the data table that contains the columns mentioned in the formula. In case of probabilistic forecasting, columns named with the percentile values are added to thr data table that contains the columns mentioned in the formula.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## Not run: 
library(datasets)
data(airquality)
summary(airquality)
airquality<-as.data.table(airquality)
airquality[complete.cases(ariquality)]
train <- airquality[1:102,]
test <- airquality[103:nrow(airquality), ]
test_data_with_predictions<-tbma(Temp ~ .,train = train,test = test,
prediction_type = "point",horizon=100,ma_order = 2)

## End(Not run)

BurakhanS/tbma documentation built on March 24, 2020, 5:27 p.m.