randomforest_fit: Determine model fit of random forest
In ing-bank/tsforecast: Time Series Forecasting Pipeline

Description Usage Arguments Value Examples

View source: R/INTRA_FORECAST_randomforest_fit.R

randomforest_fit A function to gauge the fit of a model run of a random forest, given parameters. This is a function that is used to fine-tune the random forest when forecasting

1	randomforest_fit(ML_data, mtry, nodesize, ntree)

`ML_data`	Dataset that has been prepared to run through randomForest. If originally a time series object, then it has gone through the `decompose_ts_object_for_ML` function and the first difference of the column of interest has been taken
`mtry`	randomForest parameter. The number of variables that are randomly sampled as candidates at each split. Default values are different for classification (sqrt(p)) and regression (p/3) where p is number of variables
`nodesize`	randomForest parameter. It is the minimum size of terminal nodes. Setting this numebr larger causes smaller trees to be grown and thus takes less time. Default values are different for classification (1) and regression (5)
`ntree`	randomForest parameter. It is the number of trees to grow. It should not be too small in order to eliminate any possible over-fitting and have each observation used at least a couple of times (randomForest default = 500)

The mean absolute prediction error (MAPE), in percentage terms, of the model run

ML_data <- tstools::initialize_ts_forecast_data(
   data = dummy_gasprice, 
      date_col = "year_month", 
      col_of_interest = "gasprice", 
      group_cols = c("state", "oil_company"), 
      xreg_cols = c("spotprice", "gemprice")
   ) %>% 
   dplyr::filter(grouping == "state = New York   &   oil_company = CompanyA") %>% 
   tstools::transform_data_to_ts_object() %>% 
   decompose_ts_object_for_ML() %>% 
   dplyr::mutate(col_of_interest = col_of_interest - dplyr::lag(col_of_interest)) %>% 
   dplyr::filter(!is.na(col_of_interest))
randomforest_fit(
   ML_data = ML_data, 
   mtry = 8, 
   nodesize = 5, 
   ntree = 1000
)