Description Usage Arguments Value Examples
View source: R/INTRA_FORECAST_randomforest_fit.R
randomforest_fit
A function to gauge the fit of a model run of a
random forest, given parameters. This is a function that is used to fine-tune
the random forest when forecasting
1 | randomforest_fit(ML_data, mtry, nodesize, ntree)
|
ML_data |
Dataset that has been prepared to run through randomForest. If
originally a time series object, then it has gone through the
|
mtry |
randomForest parameter. The number of variables that are randomly sampled as candidates at each split. Default values are different for classification (sqrt(p)) and regression (p/3) where p is number of variables |
nodesize |
randomForest parameter. It is the minimum size of terminal nodes. Setting this numebr larger causes smaller trees to be grown and thus takes less time. Default values are different for classification (1) and regression (5) |
ntree |
randomForest parameter. It is the number of trees to grow. It should not be too small in order to eliminate any possible over-fitting and have each observation used at least a couple of times (randomForest default = 500) |
The mean absolute prediction error (MAPE), in percentage terms, of the model run
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ML_data <- tstools::initialize_ts_forecast_data(
data = dummy_gasprice,
date_col = "year_month",
col_of_interest = "gasprice",
group_cols = c("state", "oil_company"),
xreg_cols = c("spotprice", "gemprice")
) %>%
dplyr::filter(grouping == "state = New York & oil_company = CompanyA") %>%
tstools::transform_data_to_ts_object() %>%
decompose_ts_object_for_ML() %>%
dplyr::mutate(col_of_interest = col_of_interest - dplyr::lag(col_of_interest)) %>%
dplyr::filter(!is.na(col_of_interest))
randomforest_fit(
ML_data = ML_data,
mtry = 8,
nodesize = 5,
ntree = 1000
)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.