rcDT.select: Optimal rcDT model selection

Description Usage Arguments Value Examples

View source: R/rcDT.select.R

Description

Performs k-fold cross validation for tuning of risk and tree size paramters to select the optimal rcDT model.

Usage

1
2
3
4
5
6
7
rcDT.select(data, split.var, N0 = 20, n0 = 5, efficacy = "y",
  risk = "r", col.trt = "trt", col.prtx = "prtx", lambda.seq = NA,
  lambda.length = 50, risk.control = TRUE, risk.threshold = NA,
  nfolds = 10, AIPWE = FALSE, sort = TRUE, ctg = NA,
  mtry = length(split.var), max.depth = 15, stabilize.type = c("linear",
  "rf"), stabilize = TRUE, use.other.nodes = TRUE, use.bootstrap = FALSE,
  extremeRandomized = FALSE, verbose = TRUE)

Arguments

data

data.frame. Data used to construct rcDT model. Must contain efficacy variable (y), risk variable (r), binary treatment indicator coded as 0 / 1 (trt), propensity score (prtx), candidate splitting covariates (split.var).

split.var

numeric vector. Columns of spliting variables.

N0

numeric specifying minimum number of observations required to call a node terminal. Defaults to 20.

n0

numeric specifying minimum number of treatment/control observations needed in a split to declare a node terminal. Defaults to 5.

efficacy

char. Efficacy outcome column. Defaults to 'y'.

risk

char. Risk outcome column. Defaults to 'r'.

col.trt

char. Treatment indicator column name. Should be of form 0/1 or -1/+1.

col.prtx

char. Propensity score column name.

lambda.seq

numeric vector. Identifies sequence of risk penalty parameters to be considered. Defaults to NA and will attempt to identify reasonable range.

lambda.length

numeric indicating number of risk penalty parameters to use in tuning. Larger values will cause model selection to be slower. Defaults to 50.

risk.control

logical. Should risk be controlled? Defaults to TRUE.

risk.threshold

numeric. Desired level of risk control.

AIPWE

logical. Should AIPWE (TRUE) or IPWE (FALSE) be used. Not available yet.

sort

internal use.

ctg

numeric vector corresponding to the categorical input columns. Defaults to NULL. Not available yet.

mtry

numeric specifying the number of randomly selected splitting variables to be included. Defaults to number of splitting variables.

max.depth

numeric specifying maximum depth of the tree. Defaults to 15 levels.

stabilize.type

character specifying method used for estimating residuals. Current options are 'linear' for linear model (default) and 'rf' for random forest.

stabilize

logical indicating if efficacy should be modeled using residuals. Defaults to TRUE.

use.other.nodes

logical. Should global estimator of objective function be used. Defaults to TRUE.

use.bootstrap

logical. Should a bootstrap resampling be done? Defaults to FALSE.

extremeRandomized

logical. Experimental for randomly selecting cutpoints in a random forest model. Defaults to FALSE and users should change this at their own peril.

verbose

logical. Should tuning progress bar be displayed. Defaults to TRUE.

n.folds

numeric. Number of folds to use in k-fold cross validation. Defaults to 10.

test

data.frame of testing observations. Should be formatted the same as 'data'.

Value

A summary of the cross validation including optimal penalty parameter and the optimal model.

best.tree

optimal rcDT model

alpha

tree size penalty

lambda

risk penalty

full.tree

unpruned tree

pruned.tree

output from pruning of 'full.tree'

subtrees

sequence of optimally pruned subtrees

best.tree.summaries

summary across trees

in.train

training samples from splits

in.test

testing samples from splits

elapsed.time

time elapsed during model tuning

Examples

1
2
3
4
5
6
7
# Grow large tree
set.seed(123)
dat <- generateData()
fit <- rcDT.select(data = dat, 
                   split.var = 1:10, 
                   nfolds = 5,
                   risk.threshold = 2.75)

kdoub5ha/rcITR documentation built on Aug. 5, 2020, 9:05 p.m.