View source: R/minNodeSizePruning.R
| plot.discSurvMinNodeSizePrune | R Documentation |
Computes optimal minimal node size of a discrete survival tree from a given vector of possible node sizes by cross-validation. Laplace-smoothing can be applied to the estimated hazards.
## S3 method for class 'discSurvMinNodeSizePrune'
plot(x, ...)
minNodePruning(
formulaVariable,
dataShort,
treetype = "rpart",
splitruleranger = "hellinger",
sizes,
indexList,
timeColumn,
eventColumn,
alpha = 1,
logOut = FALSE,
...
)
x |
Object of class "discSurvMinNodeSizePrune" |
... |
Additional arguments to the estimation function. It is either "rpart" or "ranger" (see argument treetype). |
formulaVariable |
Model formula for tree fitting (class "formula") of the form "~ x1 + x2 + ..." without response. |
dataShort |
Discrete survival data in short format for which a survival tree is to be fitted (class "data.frame"). |
treetype |
Type of tree to be fitted (class "character"). Possible values are "rpart" or "ranger". The default is to fit an rpart tree; when "ranger" is chosen, a ranger forest with a single tree is fitted. |
splitruleranger |
String specifying the splitting rule of the ranger tree (class "character"). Possible values are either "gini", "extratrees" or "hellinger". Default is "hellinger". |
sizes |
Vector of different node sizes to try (class "integer"). Values should be non-negative. |
indexList |
List of data partitioning indices for cross-validation (class "list"). Each element represents the test indices of one fold (class "integer"). |
timeColumn |
Character giving the column name of the observed times in the data argument (class "character"). |
eventColumn |
Character giving the column name of the event indicator in the data argument (class "character"). |
alpha |
Parameter for laplace-smoothing. A value of 0 corresponds to no laplace-smoothing (class "numeric"). |
logOut |
Logical value (class "logical"). If the argument is set to TRUE, then computation progress will be written to console. |
Computes the out-of-sample log likelihood for all data partitionings for each node size in sizes and returns the node size for which the log likelihood was minimal. Also returns an rpart tree with the optimal minimal node size using the entire data set.
A list containing the two items
OptimNodeSize - Node size with lowest out-of-sample log-likelihood
OptimTree - A tree object with type corresponding to treetype argument with the optimal minimal node size
Note that depending on argument treetype some arguments are fixed and can not be changed:
treetype="rpart": formula, data, method, minbucket
treetype="ranger": formula, data, num.trees, mtry, classification, splitrule, replace, sample.fraction, min.node.size
library(pec)
library(caret)
data(cost)
# Take subsample and convert time to years
cost$time <- ceiling(cost$time / 365)
costSub <- cost[1:50, ]
# Specify column names for data augmentation
timeColumn <- "time"
eventColumn <- "status"
# Create cross validation sets
# Stratified by event and time distribution
indexList <- createFolds(factor(paste(costSub$status,
costSub$time, sep="_")), k = 5)
# Perform minimal node size pruning
formula1 <- ~ timeInt + prevStroke + age + sex
sizes <- 1:10
optiTree <- minNodePruning(formula1, costSub, treetype = "rpart", sizes = sizes,
indexList = indexList, timeColumn = timeColumn, eventColumn = eventColumn,
alpha = 1, logOut = TRUE)
plot(optiTree)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.