View source: R/oblique_survival_forest_fit.R
ORSF | R Documentation |
Grow an oblique random survival forest (ORSF)
ORSF( data, alpha = 0.5, ntree = 100, time = "time", status = "status", eval_times = NULL, features = NULL, min_events_to_split_node = 5, min_obs_to_split_node = 10, min_obs_in_leaf_node = 5, min_events_in_leaf_node = 1, nsplit = 25, gamma = 0.5, max_pval_to_split_node = 0.5, mtry = ceiling(sqrt(ncol(data) - 2)), dfmax = mtry, use.cv = FALSE, verbose = TRUE, compute_oob_predictions = FALSE, random_seed = NULL )
data |
The data used to grow the forest. |
alpha |
The elastic net mixing parameter. A value of 1 gives the lasso penalty, and a value of 0 gives the ridge penalty. If multiple values of alpha are given, then a penalized model is fit using each alpha value prior to splitting a node. |
ntree |
The number of trees to grow. |
time |
A character value indicating the name of the column in the data that measures time. |
status |
A character value indicating the name of the column in the data that measures participant status. A value of zero indicates censoring and a value of 1 indicates that the event occurred. |
eval_times |
A numeric vector holding the time values where ORSF out-of-bag predictions should be computed and evaluated. |
features |
A character vector giving the names of columns in the data set that will be used as features. If NULL, then all of the variables in the data apart from the time and status variable are treated as features. None of these names should contain special characters or spaces. |
min_events_to_split_node |
The minimum number of events required to split a node. |
min_obs_to_split_node |
The minimum number of observations required to split a node. |
min_obs_in_leaf_node |
The minimum number of observations in child nodes. |
min_events_in_leaf_node |
The minimum number of events in child nodes. |
nsplit |
The number of random cut-points assessed for each variable. |
gamma |
numeric value that must be greater than 0 . This parameter penalizes complexity in the linear combinations. Higher values of gamma lead to more conservative linear combinations of input variables. |
max_pval_to_split_node |
The maximum p-value corresponding to the log-rank test for splitting a node. If the p-value exceeds this cut-point, the node will not be split. |
mtry |
Number of variables randomly selected as candidates for splitting a node. The default is the square root of the number of features. |
dfmax |
Maximum number of variables used in a linear combination for node splitting. |
use.cv |
if TRUE, cross-validation is used to identify optimal values of lambda, a hyper-parameter in penalized regression. if FALSE, a set of candidate lambda values are used. The set of candidate lambda values is built by picking the maximum value of lambda such that the penalized regression model has k degrees of freedom, where k is between 1 and mtry. |
verbose |
If verbose=TRUE, then the ORSF function will print output to console while it grows the tree. |
compute_oob_predictions |
If TRUE, then out-of-bag predictions will be included in the ORSF object. |
random_seed |
If a number is given, then that number is used as a random seed prior to growing the forest. Use this seed to replicate a forest if needed. |
An oblique random survival forest.
data("pbc",package='survival') pbc$status[pbc$status>=1]=pbc$status[pbc$status>=1]-1 pbc$id=NULL fctrs<-c('trt','ascites','spiders','edema','hepato','stage') for(f in fctrs)pbc[[f]]=as.factor(pbc[[f]]) pbc=na.omit(pbc) orsf=ORSF(data=pbc,ntree=5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.