run_gb: Apply gradient boosting classifier to MrP.
In autoMrP: Improving MrP with Ensemble Learning

run_gb

R Documentation

Apply gradient boosting classifier to MrP.

Description

run_gb is a wrapper function that applies the gradient boosting classifier to data provided by the user, evaluates prediction performance, and chooses the best-performing model.

Usage

run_gb(
  y,
  L1.x,
  L2.x,
  L2.eval.unit,
  L2.unit,
  L2.reg,
  loss.unit,
  loss.fun,
  interaction.depth,
  shrinkage,
  n.trees.init,
  n.trees.increase,
  n.trees.max,
  cores = cores,
  n.minobsinnode,
  data,
  verbose
)

Arguments

`y`	Outcome variable. A character vector containing the column names of the outcome variable. A character scalar containing the column name of the outcome variable in `survey`.
`L1.x`	Individual-level covariates. A character vector containing the column names of the individual-level variables in `survey` and `census` used to predict outcome `y`. Note that geographic unit is specified in argument `L2.unit`.
`L2.x`	Context-level covariates. A character vector containing the column names of the context-level variables in `survey` and `census` used to predict outcome `y`. To exclude context-level variables, set `L2.x = NULL`.
`L2.eval.unit`	Geographic unit for the loss function. A character scalar containing the column name of the geographic unit in `survey` and `census`.
`L2.unit`	Geographic unit. A character scalar containing the column name of the geographic unit in `survey` and `census` at which outcomes should be aggregated.
`L2.reg`	Geographic region. A character scalar containing the column name of the geographic region in `survey` and `census` by which geographic units are grouped (`L2.unit` must be nested within `L2.reg`). Default is `NULL`.
`loss.unit`	Loss function unit. A character-valued scalar indicating whether performance loss should be evaluated at the level of individual respondents (`individuals`) or geographic units (`L2 units`). Default is `individuals`.
`loss.fun`	Loss function. A character-valued scalar indicating whether prediction loss should be measured by the mean squared error (`MSE`) or the mean absolute error (`MAE`). Default is `MSE`.
`interaction.depth`	GB interaction depth. An integer-valued vector whose values specify the interaction depth of GB. The interaction depth defines the maximum depth of each tree grown (i.e., the maximum level of variable interactions). Default is `c(1, 2, 3)`.
`shrinkage`	GB learning rate. A numeric vector whose values specify the learning rate or step-size reduction of GB. Values between `0.001` and `0.1` usually work, but a smaller learning rate typically requires more trees. Default is `c(0.04, 0.01, 0.008, 0.005, 0.001)`.
`n.trees.init`	GB initial total number of trees. An integer-valued scalar specifying the initial number of total trees to fit by GB. Default is `50`.
`n.trees.increase`	GB increase in total number of trees. An integer-valued scalar specifying by how many trees the total number of trees to fit should be increased (until `n.trees.max` is reached) or an integer-valued vector of length `length(shrinkage)` with each of its values being associated with a learning rate in `shrinkage`. Default is `50`.
`n.trees.max`	GB maximum number of trees. An integer-valued scalar specifying the maximum number of trees to fit by GB or an integer-valued vector of length `length(shrinkage)` with each of its values being associated with a learning rate and an increase in the total number of trees. Default is `1000`.
`cores`	The number of cores to be used. An integer indicating the number of processor cores used for parallel computing. Default is 1.
`n.minobsinnode`	GB minimum number of observations in the terminal nodes. An integer-valued scalar specifying the minimum number of observations that each terminal node of the trees must contain. Default is `5`.
`data`	Data for cross-validation. A `list` of `k` `data.frames`, one for each fold to be used in `k`-fold cross-validation.
`verbose`	Verbose output. A logical argument indicating whether or not verbose output should be printed. Default is `TRUE`.

Value

The tuned gradient boosting parameters. A list with three elements: interaction_depth contains the interaction depth parameter, shrinkage contains the learning rate, n_trees the number of trees to be grown.

autoMrP documentation built on May 29, 2024, 6:40 a.m.