Description Usage Arguments Value Examples
Performs k-fold cross validation for tuning of risk and tree size paramters to select the optimal rcDT model.
1 2 3 4 5 6 7 | rcDT.select(data, split.var, N0 = 20, n0 = 5, efficacy = "y",
risk = "r", col.trt = "trt", col.prtx = "prtx", lambda.seq = NA,
lambda.length = 50, risk.control = TRUE, risk.threshold = NA,
nfolds = 10, AIPWE = FALSE, sort = TRUE, ctg = NA,
mtry = length(split.var), max.depth = 15, stabilize.type = c("linear",
"rf"), stabilize = TRUE, use.other.nodes = TRUE, use.bootstrap = FALSE,
extremeRandomized = FALSE, verbose = TRUE)
|
data |
data.frame. Data used to construct rcDT model. Must contain efficacy variable (y), risk variable (r), binary treatment indicator coded as 0 / 1 (trt), propensity score (prtx), candidate splitting covariates (split.var). |
split.var |
numeric vector. Columns of spliting variables. |
N0 |
numeric specifying minimum number of observations required to call a node terminal. Defaults to 20. |
n0 |
numeric specifying minimum number of treatment/control observations needed in a split to declare a node terminal. Defaults to 5. |
efficacy |
char. Efficacy outcome column. Defaults to 'y'. |
risk |
char. Risk outcome column. Defaults to 'r'. |
col.trt |
char. Treatment indicator column name. Should be of form 0/1 or -1/+1. |
col.prtx |
char. Propensity score column name. |
lambda.seq |
numeric vector. Identifies sequence of risk penalty parameters to be considered. Defaults to NA and will attempt to identify reasonable range. |
lambda.length |
numeric indicating number of risk penalty parameters to use in tuning. Larger values will cause model selection to be slower. Defaults to 50. |
risk.control |
logical. Should risk be controlled? Defaults to TRUE. |
risk.threshold |
numeric. Desired level of risk control. |
AIPWE |
logical. Should AIPWE (TRUE) or IPWE (FALSE) be used. Not available yet. |
sort |
internal use. |
ctg |
numeric vector corresponding to the categorical input columns. Defaults to NULL. Not available yet. |
mtry |
numeric specifying the number of randomly selected splitting variables to be included. Defaults to number of splitting variables. |
max.depth |
numeric specifying maximum depth of the tree. Defaults to 15 levels. |
stabilize.type |
character specifying method used for estimating residuals. Current options are 'linear' for linear model (default) and 'rf' for random forest. |
stabilize |
logical indicating if efficacy should be modeled using residuals. Defaults to TRUE. |
use.other.nodes |
logical. Should global estimator of objective function be used. Defaults to TRUE. |
use.bootstrap |
logical. Should a bootstrap resampling be done? Defaults to FALSE. |
extremeRandomized |
logical. Experimental for randomly selecting cutpoints in a random forest model. Defaults to FALSE and users should change this at their own peril. |
verbose |
logical. Should tuning progress bar be displayed. Defaults to TRUE. |
n.folds |
numeric. Number of folds to use in k-fold cross validation. Defaults to 10. |
test |
data.frame of testing observations. Should be formatted the same as 'data'. |
A summary of the cross validation including optimal penalty parameter and the optimal model.
best.tree |
optimal rcDT model |
alpha |
tree size penalty |
lambda |
risk penalty |
full.tree |
unpruned tree |
pruned.tree |
output from pruning of 'full.tree' |
subtrees |
sequence of optimally pruned subtrees |
best.tree.summaries |
summary across trees |
in.train |
training samples from splits |
in.test |
testing samples from splits |
elapsed.time |
time elapsed during model tuning |
1 2 3 4 5 6 7 | # Grow large tree
set.seed(123)
dat <- generateData()
fit <- rcDT.select(data = dat,
split.var = 1:10,
nfolds = 5,
risk.threshold = 2.75)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.