tune_parameters: Tuning and cross-validation of MERF parameters

View source: R/tune_parameters.R

tune_parametersR Documentation

Tuning and cross-validation of MERF parameters

Description

Function tune_parameters allows to tune parameters for the implemented MERF method. Essentially, this function can be understood as a modified wrapper for train from the package caret, treating MERFs as a custom method.

Usage

tune_parameters(
  Y,
  X,
  data,
  dName,
  trControl,
  tuneGrid,
  seed = 11235,
  gg_theme = theme_minimal(),
  plot_res = TRUE,
  return_plot = FALSE,
  na.rm = TRUE,
  ...
)

Arguments

Y

Continuous input value of target variable.

X

Matrix or data.frame of predictive covariates.

data

data.frame of survey sample data including the specified elements of Y and X.

dName

Character specifying the name of domain identifier, for which random intercepts are modeled.

trControl

Control parameters passed to train. Most important parameters are method ("repeatedcv" for x-fold cross-validation), number (the number of folds) and repeats (the number of repetitions). For further details see trainControl and the example below.

tuneGrid

A data.frame with possible tuning values. The columns must have the same names as the tuning parameters. For this tuning function the grid must comprise entries for the following parameters: num.trees, mtry, min.node.size, splitrule.

seed

Enabling reproducibility of for cross-validation and tuning. Defaults to 11235.

gg_theme

Specify a predefined theme from ggplot2. Defaults to theme_minimal.

plot_res

Optional logical. If TRUE, the plot with results of cross-validation and tuning is shown. Defaults to TRUE.

return_plot

If set to TRUE, a list of the comparative plot produced by ggplot2 is returned for further individual customization and processing.

na.rm

Logical. Whether missing values should be removed. Defaults to TRUE.

...

Additional parameters are directly passed to the random forest ranger and/or the training function train. For further details on possible parameters and examples see ranger or train.

Details

Tuning can be performed on the following four parameters: num.trees (the number of trees for a forest), mtry (number of variables as split candidates at in each node), min.node.size (minimal individual node size) and splitrule (general splitting rule). For details see ranger.

Value

Prints requested optimal tuning parameters and (if requested) an additional comparative plot produced by ggplot2.

See Also

SAEforest, MERFranger, train, ggplot

Examples


# Loading data
data("eusilcA_pop")
data("eusilcA_smp")
library(caret)

income <- eusilcA_smp$eqIncome
X_covar <- eusilcA_smp[, -c(1, 16, 17, 18)]

# Specific characteristics of Cross-validation
fitControl <- trainControl(method = "repeatedcv", number = 5,
                           repeats = 1)

# Define a tuning-grid
merfGrid <- expand.grid(num.trees = 50, mtry = c(3, 7, 9),
                        min.node.size = 10, splitrule = "variance")

tune_parameters(Y = income, X = X_covar, data = eusilcA_smp,
                dName = "district", trControl = fitControl,
                tuneGrid = merfGrid)



SAEforest documentation built on Sept. 8, 2022, 1:05 a.m.