minNodePruningCompRisks: Minimal Node Size Pruning in the Presence of Competing Risks
In discSurv: Discrete Time Survival Analysis

View source: R/minNodeSizePruningCompRisks.R

minNodePruningCompRisks

R Documentation

Minimal Node Size Pruning in the Presence of Competing Risks

Description

Computes optimal minimal node size of a discrete survival tree from a given vector of possible node sizes by cross-validation. Laplace-smoothing can be applied to the estimated hazards.

Usage

minNodePruningCompRisks(
  formula,
  data,
  treetype = "rpart",
  splitruleranger = "gini",
  sizes,
  indexList,
  timeColumn,
  eventColumns,
  lambda = 1,
  logOut = FALSE
)

Arguments

`formula`	Model formula for tree fitting("class formula")
`data`	Discrete survival data in short format for which a survival tree is to be fitted ("class data.frame").
`treetype`	Type of tree to be fitted. Possible values are "rpart" or "ranger" ("character vector"). The default is to fit an rpart tree; when "ranger" is chosen, a ranger forest with a single tree is fitted.
`splitruleranger`	String specifying the splitting rule of the ranger tree ("character vector"). Possible values are either "gini" or "extratrees". Default is "gini".
`sizes`	Vector of different node sizes to try ("integer vector"). Values need to be non-negative.
`indexList`	List of data partitioning indices for cross-validation ("class list"). Each element represents the test indices of one fold ("integer vector").
`timeColumn`	Character giving the column name of the observed times in the "data"-argument("character vector").
`eventColumns`	Character vector giving the column names of the event indicators (excluding censoring column) in the "data"-argument("character vector").
`lambda`	Parameter for laplace-smoothing. A value of 0 corresponds to no laplace-smoothing ("numeric vector").
`logOut`	Logical value("logical vector"). If True, computation progress will be written to console.

Details

Computes the out-of-sample log likelihood for all data partitionings for each node size in sizes and returns the node size for which the log likelihood was minimal. Also returns an rpart tree with the optimal minimal node size using the entire data set.

Value

A list containing the two items

Optimal minimal node size - Node size with lowest out-of-sample log-likelihood
tree - a tree object with type corresponding to treetype argument with the optimal minimal node size

Examples

# Example unemployment data
library(Ecdat)
library(caret)
data(UnempDur)
# Select training and testing subsample
subUnempDur <- UnempDur[which(UnempDur$spell < 10),]
subUnempDur <- subUnempDur[1:250,]
#creating status variable for data partitioning
subUnempDur$status <- ifelse(subUnempDur$censor1, 1, 
ifelse(subUnempDur$censor2, 2, ifelse(
subUnempDur$censor3, 3, ifelse(subUnempDur$censor4, 4, 0))))
indexList <- createFolds(subUnempDur$status*max(subUnempDur$spell) + subUnempDur$spell, k = 5)
# performing minimal node size pruning
formula <- responses ~ timeInt + age + logwage
sizes <- 1:10
timeColumn <- "spell"
eventColumns <- c("censor1", "censor2", "censor3","censor4")
optiTree <- minNodePruningCompRisks(formula, subUnempDur, treetype = "rpart", sizes = sizes, 
indexList = indexList, timeColumn = timeColumn, eventColumns = eventColumns, lambda = 1, 
logOut = TRUE)

discSurv documentation built on March 18, 2022, 7:12 p.m.