rcDT.select: Optimal rcDT model selection
In kdoub5ha/rcITR: Risk Controlled ITR Discovery

Description Usage Arguments Value Examples

Performs k-fold cross validation for tuning of risk and tree size paramters to select the optimal rcDT model.

rcDT.select(data, split.var, N0 = 20, n0 = 5, efficacy = "y",
  risk = "r", col.trt = "trt", col.prtx = "prtx", lambda.seq = NA,
  lambda.length = 50, risk.control = TRUE, risk.threshold = NA,
  nfolds = 10, AIPWE = FALSE, sort = TRUE, ctg = NA,
  mtry = length(split.var), max.depth = 15, stabilize.type = c("linear",
  "rf"), stabilize = TRUE, use.other.nodes = TRUE, use.bootstrap = FALSE,
  extremeRandomized = FALSE, verbose = TRUE)

`data`	data.frame. Data used to construct rcDT model. Must contain efficacy variable (y), risk variable (r), binary treatment indicator coded as 0 / 1 (trt), propensity score (prtx), candidate splitting covariates (split.var).
`split.var`	numeric vector. Columns of spliting variables.
`N0`	numeric specifying minimum number of observations required to call a node terminal. Defaults to 20.
`n0`	numeric specifying minimum number of treatment/control observations needed in a split to declare a node terminal. Defaults to 5.
`efficacy`	char. Efficacy outcome column. Defaults to 'y'.
`risk`	char. Risk outcome column. Defaults to 'r'.
`col.trt`	char. Treatment indicator column name. Should be of form 0/1 or -1/+1.
`col.prtx`	char. Propensity score column name.
`lambda.seq`	numeric vector. Identifies sequence of risk penalty parameters to be considered. Defaults to NA and will attempt to identify reasonable range.
`lambda.length`	numeric indicating number of risk penalty parameters to use in tuning. Larger values will cause model selection to be slower. Defaults to 50.
`risk.control`	logical. Should risk be controlled? Defaults to TRUE.
`risk.threshold`	numeric. Desired level of risk control.
`AIPWE`	logical. Should AIPWE (TRUE) or IPWE (FALSE) be used. Not available yet.
`sort`	internal use.
`ctg`	numeric vector corresponding to the categorical input columns. Defaults to NULL. Not available yet.
`mtry`	numeric specifying the number of randomly selected splitting variables to be included. Defaults to number of splitting variables.
`max.depth`	numeric specifying maximum depth of the tree. Defaults to 15 levels.
`stabilize.type`	character specifying method used for estimating residuals. Current options are 'linear' for linear model (default) and 'rf' for random forest.
`stabilize`	logical indicating if efficacy should be modeled using residuals. Defaults to TRUE.
`use.other.nodes`	logical. Should global estimator of objective function be used. Defaults to TRUE.
`use.bootstrap`	logical. Should a bootstrap resampling be done? Defaults to FALSE.
`extremeRandomized`	logical. Experimental for randomly selecting cutpoints in a random forest model. Defaults to FALSE and users should change this at their own peril.
`verbose`	logical. Should tuning progress bar be displayed. Defaults to TRUE.
`n.folds`	numeric. Number of folds to use in k-fold cross validation. Defaults to 10.
`test`	data.frame of testing observations. Should be formatted the same as 'data'.

A summary of the cross validation including optimal penalty parameter and the optimal model.

`best.tree`	optimal rcDT model
`alpha`	tree size penalty
`lambda`	risk penalty
`full.tree`	unpruned tree
`pruned.tree`	output from pruning of 'full.tree'
`subtrees`	sequence of optimally pruned subtrees
`best.tree.summaries`	summary across trees
`in.train`	training samples from splits
`in.test`	testing samples from splits
`elapsed.time`	time elapsed during model tuning

# Grow large tree
set.seed(123)
dat <- generateData()
fit <- rcDT.select(data = dat, 
                   split.var = 1:10, 
                   nfolds = 5,
                   risk.threshold = 2.75)