treeCV: Cross Validation for Optimal rcDT Model Selection for Given...

Description Usage Arguments Value Examples

View source: R/treeCV.R

Description

Performs k-fold cross validation for rcDT model to select the best subtree from the set of optimally pruned subtree generated from 'prune' function.

Usage

1
2
3
4
5
6
treeCV(dat, split.var, N0 = 20, n0 = 5, efficacy = "y", risk = "r",
  col.trt = "trt", col.prtx = "prtx", lambda = 0, risk.control = FALSE,
  risk.threshold = NA, nfolds = 10, AIPWE = FALSE, sort = TRUE,
  ctgs = NA, stabilize.type = c("linear", "rf"), stabilize = TRUE,
  use.other.nodes = TRUE, use.bootstrap = FALSE,
  extremeRandomized = FALSE)

Arguments

dat

data.frame. Data used to construct rcDT model. Must contain efficacy variable (y), risk variable (r), binary treatment indicator coded as 0 / 1 (trt), propensity score (prtx), candidate splitting covariates.

split.var

numeric vector. Columns of spliting variables.

N0

numeric specifying minimum number of observations required to call a node terminal. Defaults to 20.

n0

numeric specifying minimum number of treatment/control observations needed in a split to declare a node terminal. Defaults to 5.

efficacy

char. Efficacy outcome column. Defaults to 'y'.

risk

char. Risk outcome column. Defaults to 'r'.

col.trt

char. Treatment indicator column name. Should be of form 0/1 or -1/+1.

col.prtx

char. Propensity score column name.

lambda

numeric. Penalty parameter for risk scores. Defaults to 0, i.e. no constraint.

risk.control

logical. Should risk be controlled? Defaults to TRUE.

risk.threshold

numeric. Desired level of risk control.

AIPWE

logical. Should AIPWE (TRUE) or IPWE (FALSE) be used. Not available yet.

sort

internal use.

stabilize.type

character specifying method used for estimating residuals. Current options are 'linear' for linear model (default) and 'rf' for random forest.

stabilize

logical indicating if efficacy should be modeled using residuals. Defaults to TRUE.

use.other.nodes

logical. Should global estimator of objective function be used. Defaults to TRUE.

use.bootstrap

logical. Should a bootstrap resampling be done? Defaults to FALSE.

extremeRandomized

logical. Experimental for randomly selecting cutpoints in a random forest model. Defaults to FALSE and users should change this at their own peril. #' @return A summary of the cross validation including optimal penalty parameter and the optimal model.

test

data.frame of testing observations. Should be formatted the same as 'data'.

max.depth

numeric specifying maximum depth of the tree. Defaults to 15 levels.

mtry

numeric specifying the number of randomly selected splitting variables to be included. Defaults to number of splitting variables.

ctg

numeric vector corresponding to the categorical input columns. Defaults to NULL. Not available yet.

Value

best.tree.size

optimal rcDT model based on size

best.tree.alpha

optimal rcDT model based on alpha parameter selection

best.alpha

optimal lambda parameter selected from the cross validation procedure

full.tree

unpruned tree

pruned.tree

output from pruning of 'full.tree'

data

input data

details

summary of model performance

subtrees

sequence of optimally pruned subtrees of 'full.tree'

in.train

training samples from splits

in.test

testing samples from splits

Examples

1
2
3
4
5
# Grow large tree
set.seed(1)
dat <- generateData()
fit <- treeCV(dat, split.var = 1:10, nfolds = 5, lambda = 1,
                 risk.control = TRUE, risk.threshold = 2.75)

kdoub5ha/mvITR documentation built on April 7, 2020, 3:59 a.m.