randomforest_fit: Determine model fit of random forest

Description Usage Arguments Value Examples

View source: R/INTRA_FORECAST_randomforest_fit.R

Description

randomforest_fit A function to gauge the fit of a model run of a random forest, given parameters. This is a function that is used to fine-tune the random forest when forecasting

Usage

1
randomforest_fit(ML_data, mtry, nodesize, ntree)

Arguments

ML_data

Dataset that has been prepared to run through randomForest. If originally a time series object, then it has gone through the decompose_ts_object_for_ML function and the first difference of the column of interest has been taken

mtry

randomForest parameter. The number of variables that are randomly sampled as candidates at each split. Default values are different for classification (sqrt(p)) and regression (p/3) where p is number of variables

nodesize

randomForest parameter. It is the minimum size of terminal nodes. Setting this numebr larger causes smaller trees to be grown and thus takes less time. Default values are different for classification (1) and regression (5)

ntree

randomForest parameter. It is the number of trees to grow. It should not be too small in order to eliminate any possible over-fitting and have each observation used at least a couple of times (randomForest default = 500)

Value

The mean absolute prediction error (MAPE), in percentage terms, of the model run

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
ML_data <- tstools::initialize_ts_forecast_data(
   data = dummy_gasprice, 
      date_col = "year_month", 
      col_of_interest = "gasprice", 
      group_cols = c("state", "oil_company"), 
      xreg_cols = c("spotprice", "gemprice")
   ) %>% 
   dplyr::filter(grouping == "state = New York   &   oil_company = CompanyA") %>% 
   tstools::transform_data_to_ts_object() %>% 
   decompose_ts_object_for_ML() %>% 
   dplyr::mutate(col_of_interest = col_of_interest - dplyr::lag(col_of_interest)) %>% 
   dplyr::filter(!is.na(col_of_interest))
randomforest_fit(
   ML_data = ML_data, 
   mtry = 8, 
   nodesize = 5, 
   ntree = 1000
)

ing-bank/tsforecast documentation built on Sept. 18, 2020, 9:40 a.m.