minNodePruningCompRisks: Minimal Node Size Pruning For Competing Risks

View source: R/minNodeSizePruningCompRisks.R

minNodePruningCompRisksR Documentation

Minimal Node Size Pruning For Competing Risks

Description

Computes optimal minimal node size of a discrete survival tree from a given vector of possible node sizes by cross-validation. Laplace-smoothing can be applied to the estimated hazards.

Usage

minNodePruningCompRisks(
  formulaVariable,
  data,
  treetype = "rpart",
  splitruleranger = "gini",
  sizes,
  indexList,
  timeColumn,
  eventColumns,
  alpha = 1,
  logOut = FALSE,
  eventColumnsAsFactor = FALSE,
  ...
)

Arguments

formulaVariable

Model formula for tree fitting (class "formula") of the form "~ x1 + x2 + ..." without response.

data

Discrete survival data in short format for which a survival tree is to be fitted (class "data.frame").

treetype

Type of tree to be fitted. Possible values are "rpart" or "ranger" (class "character"). The default is to fit an rpart tree; when "ranger" is chosen, a ranger forest with a single tree is fitted.

splitruleranger

String specifying the splitting rule of the ranger tree (class "character"). Possible values are either "gini" or "extratrees". Default is "gini".

sizes

Vector of different node sizes to try (class "integer"). Values need to be non-negative.

indexList

List of data partitioning indices for cross-validation (class "list"). Each element represents the test indices of one fold (class "integer").

timeColumn

Character giving the column name of the observed times in the "data"-argument (class "character").

eventColumns

Character vector giving the column names of the event indicators (excluding censoring column) in the "data"-argument (class "character").

alpha

Parameter for laplace-smoothing. A value of 0 corresponds to no laplace-smoothing (class "numeric").

logOut

Logical value (class "logical"). If True, computation progress will be written to console.

eventColumnsAsFactor

Should the argument eventColumns be intepreted as column name of a factor variable (class "logical")? Default is FALSE.

...

Additional arguments to the estimation function. It is either "rpart" or "ranger" (see argument treetype).

Details

Computes the out-of-sample log likelihood for all data partitionings for each node size in sizes and returns the node size for which the log likelihood was minimal. Also returns an rpart tree with the optimal minimal node size using the entire data set.

Value

A list containing the two items

  • OptimNodeSize - Node size with lowest out-of-sample log-likelihood

  • OptimTree - A tree object with type corresponding to treetype argument with the optimal minimal node size

Note

Note that depending on argument treetype some arguments are fixed and can not be changed:

  • treetype="rpart": formula, data, method, minbucket

  • treetype="ranger": formula, data, num.trees, mtry, classification, splitrule, replace, sample.fraction, min.node.size

Examples

# Example unemployment data
library(Ecdat)
library(caret)
data(UnempDur)

# Select training and testing subsample
subUnempDur <- UnempDur[which(UnempDur$spell < 10),]
subUnempDur <- subUnempDur[1:250,]

# Creating status variable for data partitioning
subUnempDur$status <- ifelse(subUnempDur$censor1, 1, 
ifelse(subUnempDur$censor2, 2, ifelse(
subUnempDur$censor3, 3, ifelse(subUnempDur$censor4, 4, 0))))

# Create cross validation sets
# Stratified by events and time distribution
set.seed(1972)
indexList <- createFolds(factor(paste(subUnempDur$status, 
subUnempDur$spell, sep="_")), k = 5)

# Perform minimal node size pruning
formula1 <- ~ timeInt + age + logwage
sizes <- 1:10
timeColumn <- "spell"
eventColumns <- c("censor1", "censor2", "censor3","censor4")
optiTree <- minNodePruningCompRisks(formula1, subUnempDur, treetype = "rpart", sizes = sizes, 
indexList = indexList, timeColumn = timeColumn, eventColumns = eventColumns, alpha = 1, 
logOut = TRUE)
plot(optiTree)


discSurv documentation built on April 29, 2026, 9:07 a.m.