rcDT: Constructs an rcDT model

Description Usage Arguments Value Examples

View source: R/rcDT.R

Description

Constructs a risk controlled decision tree (rcDT) given an efficacy and risk outcome.

Usage

1
2
3
4
5
6
rcDT(data, split.var, test = NULL, ctg = NULL, efficacy = "y",
  risk = "r", col.trt = "trt", col.prtx = "prtx", risk.control = TRUE,
  risk.threshold = NA, lambda = 0, min.ndsz = 20, n0 = 5,
  stabilize = TRUE, stabilize.type = c("linear", "rf"),
  use.other.nodes = TRUE, mtry = length(split.var), max.depth = 15,
  AIPWE = FALSE, extremeRandomized = FALSE, print.summary = TRUE)

Arguments

data

data.frame. Data used to construct rcDT model. Must contain efficacy variable (y), risk variable (r), binary treatment indicator coded as 0 / 1 (trt), propensity score (prtx), candidate splitting covariates (split.var).

split.var

numeric vector. Columns of spliting variables.

test

data.frame of testing observations. Should be formatted the same as 'data'.

ctg

numeric vector corresponding to the categorical input columns. Defaults to NULL. Not available yet.

efficacy

char. Efficacy outcome column. Assumes larger values are desirable Defaults to 'y'.

risk

char. Risk outcome column. Assumes smaller values are desirable Defaults to 'r'.

col.trt

char. Treatment indicator column name. Should be of form 0/1 or -1/+1.

col.prtx

char. Propensity score column name.

risk.control

logical. Should risk be controlled? Defaults to TRUE.

risk.threshold

numeric. Desired level of risk control.

lambda

numeric. Penalty parameter for risk scores. Defaults to 0, i.e. no constraint.

Optional arguments

min.ndsz

numeric specifying minimum number of observations required to call a node terminal. Defaults to 20.

n0

numeric specifying minimum number of treatment/control observations needed in a split to declare a node terminal. Defaults to 5.

stabilize

logical indicating if efficacy should be modeled using residuals. Defaults to TRUE.

stabilize.type

character specifying method used for estimating residuals. Current options are 'linear' for linear model (default) and 'rf' for random forest.

use.other.nodes

logical. Should global estimator of objective function be used. Defaults to TRUE.

mtry

numeric specifying the number of randomly selected splitting variables to be included. Defaults to number of splitting variables.

max.depth

numeric specifying maximum depth of the tree. Defaults to 15 levels.

AIPWE

logical. Should AIPWE (TRUE) or IPWE (FALSE) be used. Not available yet.

extremeRandomized

logical. Experimental for randomly selecting cutpoints in a random forest model. Defaults to FALSE and users should change this at their own peril.

print.summary

logical. Should a summary of the tree building be printed? Defaults to TRUE for single trees.

Value

Summary of rcDT model

tree

data.frame with the following: Each 'node' begins with "0" indicating the root node, followed by a "1" or "2" indicating the less than (or left) child node or greater than (or right) child node. Additionally, the number of observations 'size', number treated 'n.1', number on control 'n.0', and treatment effect 'trt.effect' summaries are provided. The splitting information includes the column of the chosen splitting variable ‘var', the variable name ’vname', the direction the treatment is sent 'cut.1' ("r" for right child node, and "l" for left), the chosen split value 'cut.2', and the estimated value function 'score'.

y

efficacy values used in modeling. Will likely differ from original input 'y' if stabilization was used

risk.threshold

value of risk control used

data

input dataset

fit.y

fitted model for residuals is 'stabilize' was used

split.var

splitting covariates used

Examples

1
2
3
4
5
6
7
set.seed(123)
dat <- generateData()
# Generates tree using simualated EMR data with splitting variables located in columns 1-4.
tree <- rcDT(data = dat, 
             split.var = 1:10, 
             risk.threshold = 2.75, 
             lambda = 1)

kdoub5ha/rcITR documentation built on Aug. 5, 2020, 9:05 p.m.