View source: R/create.classifier.multivariate.R
create.classifier.multivariate | R Documentation |
Trains a model on training datasets. Predicts the risk score for all the
training & datasets, independently. This function also predicts the risk
score for combined training datasets cohort and validation datasets cohort.
The risk score estimation is done by multivariate models fit by
fit.survivalmodel
. The function also predicts risk scores for each of
the top.n.features
independently.
create.classifier.multivariate( data.directory = ".", output.directory = ".", feature.selection.datasets = NULL, feature.selection.p.threshold = 0.05, training.datasets = NULL, validation.datasets = NULL, top.n.features = 25, models = c("1", "2", "3"), learning.algorithms = c("backward", "forward"), alpha.glm = c(1), k.fold.glm = 10, seed.value = 51214, cores.glm = 1, rf.ntree = 1000, rf.mtry = NULL, rf.nodesize = 15, rf.samptype = "swor", rf.sampsize = function(x) { x * 0.66 }, ... )
data.directory |
Path to the directory containing datasets as specified
by |
output.directory |
Path to the output folder where intermediate and results files will be saved |
feature.selection.datasets |
A vector containing names of datasets used
for feature selection in function |
feature.selection.p.threshold |
One of the P values that were used for
feature selection in function |
training.datasets |
A vector containing names of training datasets |
validation.datasets |
A vector containing names of validation datasets |
top.n.features |
A numeric value specifying how many top ranked features will be used for univariate survival modelling |
models |
A character vector specifying which of the models ('1' = N+E, '2' = N, '3' = E) to run |
learning.algorithms |
A character vector specifying which learning algorithm to be used for model fitting and feature selection. Defaults to c('backward', 'forward'). Available options are: c('backward', 'forward', 'glm', 'randomforest') |
alpha.glm |
A numeric vector specifying elastic-net mixing parameter alpha, with range alpha raning from [0,1]. 1 for LASSO (default) and 0 for ridge. For multiple values of alpha, most optimal value is selected through cross validation on training set |
k.fold.glm |
A numeric value specifying k-fold cross validation if glm
was chosen in |
seed.value |
A numeric value specifying seed for glm k-fold cross or random forest
validation if glm was chosen in |
cores.glm |
An integer value specifying number of cores to be used for
glm if it was chosen in |
rf.ntree |
An integer value specifying the number of trees in random forest. Defaults to 1000. This should be tuned after starting with a large forest such as 1000 in the initial run and assessing the results in output\/OOB_error__TRAINING_* to see where the OOB error rate stablises, and then rerunning with the stablised rf.ntree parameter |
rf.mtry |
An integer value specifying the number of variables randomly selected
for splitting a node. Defaults to sqrt(features), which is the same as in the
underlying R package random survival forest |
rf.nodesize |
An integer value specifying number of unique cases in a terminal
node. Defaults to 15, which is the same as in the underlying R package random survival
forest |
rf.samptype |
An character string specifying name of sampling. Defaults to sampling without replacement 'swor'. Available options are: c('swor', 'swr') |
rf.sampsize |
A function specifying sampling size when |
... |
other params to be passed on to the random forest call to the underlying
R package random survival forest |
The output files are stored under output.directory
/output/
Syed Haider & Vincent Stimper
# see package's main documentation
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.