ab_cvRF: Select optimal tree depth for SL.randomForest() using...

Description Usage Arguments Details Value Examples

Description

ab_cvRF is an internal tuning function called by agecurveAb and tmleAb that selects optimal tree depth by tuning the nodesize parameter

Usage

1
2
ab_cvRF(Y, X, id = NULL, family = gaussian(), SL.library,
  cvControl = list(), print = FALSE, RFnodesize = seq(15, 40, by = 5))

Arguments

Y

The outcome. Must be a numeric vector.

X

A matrix of features that predict Y, usually a data.frame.

id

An optional cluster or repeated measures id variable. For cross-validation splits, id forces observations in the same cluster or for the same individual to be in the same validation fold.

family

Model family (gaussian for continuous outcomes, binomial for binary outcomes)

SL.library

SuperLearner library

cvControl

Optional list to control cross-valiation (see SuperLearner for details).

print

logical. print messages? Defaults to FALSE

RFnodesize

a sequence of nodes used by the random forest algorithm. Defaults to a sequence from 15 to 40 by every 5 nodes

Details

ab_cvRF is an internal function called by agecurveAb or tmleAb if SL.randomForest() is included in the algorithm library. It performs an addition pre-screen step of selecting the optimal node depth for random forest using cross validation. The default range of node sizes evaluated is 15, 20, ..., 40. In the context of Age-antibody curves, without this tuning step random forest will fit extremely jagged curves that are clear overfits. This additional selection step prevents overfitting. Cross-validated risks are estimated using SuperLearner.

Value

returns a list with updated SuperLearner library, the optimal node size, and cvRisks (cross-validated risks for each nodesize evaluated)

Examples

1
# TBD

ben-arnold/tmleAb documentation built on May 12, 2019, 10:55 a.m.