consensus_nestedCV | R Documentation |
consensus_nestedCV Consensus nested cross validation for feature selection and parameter tuning
consensus_nestedCV(
train.ds = NULL,
validation.ds = NULL,
label = "class",
method.model = "classification",
is.simulated = TRUE,
ncv_folds = c(10, 10),
param.tune = FALSE,
learning_method = "rf",
xgb.obj = "binary:logistic",
importance.algorithm = "ReliefFequalK",
wrapper = "relief",
inner_selection_percent = NULL,
inner_selection_positivescores = TRUE,
tune.inner_selection_percent = NULL,
tune.k = FALSE,
tuneGrid = NULL,
relief.k.method = "k_half_sigma",
num_tree = 500,
covars_vec = NULL,
covars.pval.adj = 0.05,
verbose = FALSE
)
train.ds |
A training data frame with last column as outcome |
validation.ds |
A validation data frame with last column as outcome |
label |
A character vector of the outcome variable column name. |
method.model |
Column name of outcome variable (string), classification or regression. If the analysis goal is classification make the column a factor type. For regression, make outcome column numeric type. |
is.simulated |
A TRUE or FALSE character for data type |
ncv_folds |
A numeric vector to indicate nested cv folds: c(k_outer, k_inner) |
param.tune |
A TRUE or FALSE character for tuning parameters |
learning_method |
Name of the method: glmnet/xgbTree/rf |
importance.algorithm |
A character vestor containing a specific importance algorithm subtype |
wrapper |
feature selection algorithm including: rf, glmnet, t.test, centrality methods (PageRank, Katz, EpistasisRank, and EpistasisKatz from Rinbix packages), ReliefF family, and etc. |
inner_selection_percent |
= Percentage of features to be selected in each inner fold. |
inner_selection_positivescores |
A TRUE or FALSE character to select positive scores (if the value is False, use the percentage method). |
tune.inner_selection_percent |
A sequence vector of possible percentages for tuning |
tune.k |
A sequence vector to tune k nearest neighbors in relief method, if TRUE the default grid is seq(1,kmax), where kmax=floor((m-1)/2) and m is the number of samples. However, this kmax is for balanced data. If data are imbalance, where m_minority + m_majority = m, then kmax = floor(m_minority-1). Default is FALSE. |
tuneGrid |
A data frame with possible tuning values. The columns are named the same as the tuning parameters. This caret library parameter, for more information refer to http://topepo.github.io/caret/available-models.html. |
relief.k.method |
A character of numeric to indicate number of nearest neighbors for relief algorithm. Possible characters are: k_half_sigma (floor((num.samp-1)*0.154)), m6 (floor(num.samp/6)), myopic (floor((num.samp-1)/2)), and m4 (floor(num.samp/4)) |
num_tree |
Number of trees in random forest and xgboost methods |
verbose |
A flag indicating whether verbose output be sent to stdout |
A list with:
Training data accuracy
Validation data accuracy
number of variables detected correctly in nested cross validation
Traing model to use for validation
total elapsed time
num.samples <- 100 num.variables <- 100 pct.signals <- 0.1 label <- "class" sim.data <- createSimulation(num.samples = num.samples, num.variables = num.variables, pct.signals = pct.signals, sim.type = "mainEffect", label = label, verbose = FALSE) cnCV.results <- consensus_nestedCV(train.ds = sim.data$train, validation.ds = sim.data$holdout, label = label, is.simulated = TRUE, ncv_folds = c(10, 10), param.tune = FALSE, learning_method = "rf", importance.algorithm = "ReliefFbestK", num_tree = 500, verbose = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.