metb: Boosted decision trees with random effects

Description Usage Arguments Details Value Functions

View source: R/metb.R

Description

At each iteration, a single decision tree is fit using gbm.fit, and the terminal node means are allowed to vary by group using lmer.

Usage

1
2
3
4
5
6
7
8
9
metb(y, X, id, n.trees = 5, interaction.depth = 3, n.minobsinnode = 20,
  shrinkage = 0.01, bag.fraction = 0.5, train.fraction = NULL,
  cv.folds = 1, subset = NULL, indep = TRUE, save.mods = FALSE,
  mc.cores = 1, num_threads = 1, verbose = TRUE, ...)

metb.fit(y, X, id, n.trees = 5, interaction.depth = 3,
  n.minobsinnode = 20, shrinkage = 0.01, bag.fraction = 0.5,
  train.fraction = NULL, subset = NULL, indep = TRUE, num_threads = 1,
  save.mods = FALSE, verbose = TRUE, ...)

Arguments

y

outcome vector (continuous)

X

matrix or data frame of predictors

id

name or index of grouping variable

n.trees

the total number of trees to fit (iterations).

interaction.depth

The maximum depth of trees. 1 implies a single split (stump), 2 implies a tree with 2 splits, etc.

n.minobsinnode

minimum number of observations in the terminal nodes of each tree

shrinkage

a shrinkage parameter applied to each tree. Also known as the learning rate or step-size reduction.

bag.fraction

the fraction of the training set observations randomly selected to propose the next tree. This introduces randomnesses into the model fit. If bag.fraction<1 then running the same model twice will result in similar but different fits. Using set.seed ensures reproducibility.

train.fraction

of sample used for training

cv.folds

number of cross-validation folds. In addition to the usual fit, will perform cross-validation over a grid of meta-parameters (see details).

subset

index of observations to use for training

indep

whether random effects are independent or allowed to covary (default is TRUE, for speed)

save.mods

whether the lmer models fit at each iteration are saved (required to use predict)

mc.cores

number of parallel cores

num_threads

number of threads

verbose

In the final model fit, will print every '10' trees/iterations.

...

arguments passed to gbm.fit

Details

Meta-parameter tuning is handled by passing vectors of possible values for n.trees, shrinkage, indep, interaction.depth, and n.minobsinnode and setting cv.folds > 1. Setting mc.cores > 1 will carry out the tuning in parallel by forking via mclapply. Tuning is only done within the training set.

Prediction is most easily carried out by passing the entire X matrix to metb, and specifying the training set using subset. Otherwise, set save.mods=TRUE and use predict.

Value

An metb object consisting of the following list elements:

yhat

Vector of predictions at the best iteration (fixed + ranef)

ranef

Vector of random effects at the best iteration

fixed

Vector of fixed effect predictions at the best iteration

shrinkange

Amount of shrinkage

subset

Vector of observations used for training

best.trees

Best number of trees by training, test, oob, and cv error

best.params

The best set of meta-parameter values given by CV

params

A data frame of all meta-parameter combinations and the corresponding CV error

sigma

The variance due to the grouping variable at each iteration

xnames

Column names of X

mods

List of lmer models (if save.mods=TRUE)

id

name or index of the grouping variable

trees

List of trees fit at each iteration

init

initial prediction

var.type

Type of variables (gbm.fit)

c.split

List of categorical splits (gbm.fit)

train.err

Training error at each iteration

oob.err

Out of bag error at each iteration

test.err

Test error at each iteration

cv.err

Cross-validation error at each iteration

Functions


patr1ckm/mvtboost documentation built on May 24, 2019, 8:21 p.m.