tune_parameters: Tuning and cross-validation of MERF parameters
In SAEforest: Mixed Effect Random Forests for Small Area Estimation

tune_parameters

R Documentation

Tuning and cross-validation of MERF parameters

Description

Function tune_parameters allows to tune parameters for the implemented MERF method. Essentially, this function can be understood as a modified wrapper for train from the package caret, treating MERFs as a custom method.

Usage

tune_parameters(
  Y,
  X,
  data,
  dName,
  trControl,
  tuneGrid,
  seed = 11235,
  gg_theme = theme_minimal(),
  plot_res = TRUE,
  return_plot = FALSE,
  na.rm = TRUE,
  ...
)

Arguments

`Y`	Continuous input value of target variable.
`X`	Matrix or data.frame of predictive covariates.
`data`	data.frame of survey sample data including the specified elements of `Y` and `X`.
`dName`	Character specifying the name of domain identifier, for which random intercepts are modeled.
`trControl`	Control parameters passed to train. Most important parameters are `method` ("repeatedcv" for x-fold cross-validation), `number` (the number of folds) and `repeats` (the number of repetitions). For further details see trainControl and the example below.
`tuneGrid`	A data.frame with possible tuning values. The columns must have the same names as the tuning parameters. For this tuning function the grid must comprise entries for the following parameters: `num.trees, mtry, min.node.size, splitrule`.
`seed`	Enabling reproducibility of for cross-validation and tuning. Defaults to `11235`.
`gg_theme`	Specify a predefined theme from ggplot2. Defaults to `theme_minimal`.
`plot_res`	Optional logical. If `TRUE`, the plot with results of cross-validation and tuning is shown. Defaults to `TRUE`.
`return_plot`	If set to `TRUE`, a list of the comparative plot produced by ggplot2 is returned for further individual customization and processing.
`na.rm`	Logical. Whether missing values should be removed. Defaults to `TRUE`.
`...`	Additional parameters are directly passed to the random forest ranger and/or the training function train. For further details on possible parameters and examples see ranger or train.

Details

Tuning can be performed on the following four parameters: num.trees (the number of trees for a forest), mtry (number of variables as split candidates at in each node), min.node.size (minimal individual node size) and splitrule (general splitting rule). For details see ranger.

Value

Prints requested optimal tuning parameters and (if requested) an additional comparative plot produced by ggplot2.

Examples


# Loading data
data("eusilcA_pop")
data("eusilcA_smp")
library(caret)

income <- eusilcA_smp$eqIncome
X_covar <- eusilcA_smp[, -c(1, 16, 17, 18)]

# Specific characteristics of Cross-validation
fitControl <- trainControl(method = "repeatedcv", number = 5,
                           repeats = 1)

# Define a tuning-grid
merfGrid <- expand.grid(num.trees = 50, mtry = c(3, 7, 9),
                        min.node.size = 10, splitrule = "variance")

tune_parameters(Y = income, X = X_covar, data = eusilcA_smp,
                dName = "district", trControl = fitControl,
                tuneGrid = merfGrid)

SAEforest documentation built on Sept. 8, 2022, 1:05 a.m.