rcRF.select: rcRF model selection
In kdoub5ha/rcITR: Risk Controlled ITR Discovery

Description Usage Arguments Value Examples

Performs model selection for rcRF model to select the best penalty parameter.

rcRF.select(data, split.var, test = NULL, N0 = 20, n0 = 5,
  efficacy = "y", risk = "r", col.trt = "trt", col.prtx = "prtx",
  ntree = 500, lambda.upper = NA, risk.control = TRUE,
  risk.threshold = NA, AIPWE = FALSE, ctg = NA,
  mtry = max(floor(length(split.var)/3), 1), avoid.nul.tree = FALSE,
  max.depth = 15, stabilize.type = c("linear", "rf"), stabilize = TRUE,
  verbose = FALSE, use.other.nodes = TRUE, extremeRandomized = FALSE,
  importance = FALSE, order.importances = TRUE, max.iter = 10,
  risk.tolerance = c(0.995, 1.005))

`data`	data.frame. Data used to construct rcDT model. Must contain efficacy variable (y), risk variable (r), binary treatment indicator coded as 0 / 1 (trt), propensity score (prtx), candidate splitting covariates.
`split.var`	numeric vector. Columns of spliting variables.
`test`	data.frame of testing observations. Should be formatted the same as 'data'.
`N0`	numeric specifying minimum number of observations required to call a node terminal. Defaults to 20.
`n0`	numeric specifying minimum number of treatment/control observations needed in a split to declare a node terminal. Defaults to 5.
`efficacy`	char. Efficacy outcome column. Defaults to 'y'.
`risk`	char. Risk outcome column. Defaults to 'r'.
`col.trt`	char. Treatment indicator column name. Should be of form 0/1 or -1/+1.
`col.prtx`	char. Propensity score column name.
`ntree`	numeric. Number of trees to construct.
`lambda.upper`	numeric. Upper bound for risk penalty. An attempt at reasonable selection will be performed automatically.
`risk.control`	logical. Should risk be controlled? Defaults to TRUE.
`risk.threshold`	numeric. Desired level of risk control.
`AIPWE`	logical. Should AIPWE (TRUE) or IPWE (FALSE) be used. Not available yet.
`ctg`	numeric vector corresponding to the categorical input columns. Defaults to NULL. Not available yet.
`mtry`	numeric specifying the number of randomly selected splitting variables to be included. Defaults to the greater of 1 and length(split.var)/3.
`avoid.nul.tree`	logical. Should null trees be discarded?
`max.depth`	numeric specifying maximum depth of the tree. Defaults to 15 levels.
`stabilize.type`	character specifying method used for estimating residuals. Current options are 'linear' for linear model (default) and 'rf' for random forest.
`stabilize`	logical indicating if efficacy should be modeled using residuals. Defaults to TRUE.
`verbose`	logical. Give updates about forest progression?
`use.other.nodes`	logical. Should global estimator of objective function be used. Defaults to TRUE.
`extremeRandomized`	logical. Experimental for randomly selecting cutpoints in a random forest model. Defaults to FALSE and users should change this at their own peril.
`importance`	logical. Indicated if variable importance measures should be estimated and returned. Defaults to FALSE.
`order.importances`	logical. Should importances be ordered (if requested)?
`max.iter`	numeric. Indicates the maximum number of forest iterations to perform. Defaults to 10.
`risk.tolerance`	numeric. Two component vector giving the bound on risk that is acceptable (acceptable risk range is calcuated as risk.threshold * risk.tolerance). Defaults to c(0.995, 1.005), i.e. 0.5% tolerance.

A summary of the cross validation including optimal penalty parameter and the optimal model.

`best.fit`	optimal rcRF model
`lambda`	optimal lambda value selected
`oob.risk`	out-of-bag risk from best model
`converged`	max number of iterations reached?
`importances`	importance measures, if requested
`risks`	vector of risk scores obtained over tuning procedure
`lambdas`	vector of lambda values tried over tuning procedure
`time.elapsed`	elapsed time for model tuning

# Grow large tree
set.seed(123)
dat <- generateData()
fit <- rcRF.select(data = dat, 
                   split.var = 1:10,
                   risk.threshold = 2.75)