ORSF: Grow an oblique random survival forest (ORSF)

View source: R/oblique_survival_forest_fit.R

ORSFR Documentation

Grow an oblique random survival forest (ORSF)

Description

Grow an oblique random survival forest (ORSF)

Usage

ORSF(
  data,
  alpha = 0.5,
  ntree = 100,
  time = "time",
  status = "status",
  eval_times = NULL,
  features = NULL,
  min_events_to_split_node = 5,
  min_obs_to_split_node = 10,
  min_obs_in_leaf_node = 5,
  min_events_in_leaf_node = 1,
  nsplit = 25,
  gamma = 0.5,
  max_pval_to_split_node = 0.5,
  mtry = ceiling(sqrt(ncol(data) - 2)),
  dfmax = mtry,
  use.cv = FALSE,
  verbose = TRUE,
  compute_oob_predictions = FALSE,
  random_seed = NULL
)

Arguments

data

The data used to grow the forest.

alpha

The elastic net mixing parameter. A value of 1 gives the lasso penalty, and a value of 0 gives the ridge penalty. If multiple values of alpha are given, then a penalized model is fit using each alpha value prior to splitting a node.

ntree

The number of trees to grow.

time

A character value indicating the name of the column in the data that measures time.

status

A character value indicating the name of the column in the data that measures participant status. A value of zero indicates censoring and a value of 1 indicates that the event occurred.

eval_times

A numeric vector holding the time values where ORSF out-of-bag predictions should be computed and evaluated.

features

A character vector giving the names of columns in the data set that will be used as features. If NULL, then all of the variables in the data apart from the time and status variable are treated as features. None of these names should contain special characters or spaces.

min_events_to_split_node

The minimum number of events required to split a node.

min_obs_to_split_node

The minimum number of observations required to split a node.

min_obs_in_leaf_node

The minimum number of observations in child nodes.

min_events_in_leaf_node

The minimum number of events in child nodes.

nsplit

The number of random cut-points assessed for each variable.

gamma

numeric value that must be greater than 0 . This parameter penalizes complexity in the linear combinations. Higher values of gamma lead to more conservative linear combinations of input variables.

max_pval_to_split_node

The maximum p-value corresponding to the log-rank test for splitting a node. If the p-value exceeds this cut-point, the node will not be split.

mtry

Number of variables randomly selected as candidates for splitting a node. The default is the square root of the number of features.

dfmax

Maximum number of variables used in a linear combination for node splitting.

use.cv

if TRUE, cross-validation is used to identify optimal values of lambda, a hyper-parameter in penalized regression. if FALSE, a set of candidate lambda values are used. The set of candidate lambda values is built by picking the maximum value of lambda such that the penalized regression model has k degrees of freedom, where k is between 1 and mtry.

verbose

If verbose=TRUE, then the ORSF function will print output to console while it grows the tree.

compute_oob_predictions

If TRUE, then out-of-bag predictions will be included in the ORSF object.

random_seed

If a number is given, then that number is used as a random seed prior to growing the forest. Use this seed to replicate a forest if needed.

Value

An oblique random survival forest.

Examples

data("pbc",package='survival')
pbc$status[pbc$status>=1]=pbc$status[pbc$status>=1]-1
pbc$id=NULL
fctrs<-c('trt','ascites','spiders','edema','hepato','stage')
for(f in fctrs)pbc[[f]]=as.factor(pbc[[f]])
pbc=na.omit(pbc)

orsf=ORSF(data=pbc,ntree=5)


obliqueRSF documentation built on Aug. 29, 2022, 1:07 a.m.