consensus_nestedCV: consensus_nestedCV Consensus nested cross validation for...

View source: R/cncv.R

consensus_nestedCVR Documentation

consensus_nestedCV Consensus nested cross validation for feature selection and parameter tuning

Description

consensus_nestedCV Consensus nested cross validation for feature selection and parameter tuning

Usage

consensus_nestedCV(
  train.ds = NULL,
  validation.ds = NULL,
  label = "class",
  method.model = "classification",
  is.simulated = TRUE,
  ncv_folds = c(10, 10),
  param.tune = FALSE,
  learning_method = "rf",
  xgb.obj = "binary:logistic",
  importance.algorithm = "ReliefFequalK",
  wrapper = "relief",
  inner_selection_percent = NULL,
  inner_selection_positivescores = TRUE,
  tune.inner_selection_percent = NULL,
  tune.k = FALSE,
  tuneGrid = NULL,
  relief.k.method = "k_half_sigma",
  num_tree = 500,
  covars_vec = NULL,
  covars.pval.adj = 0.05,
  verbose = FALSE
)

Arguments

train.ds

A training data frame with last column as outcome

validation.ds

A validation data frame with last column as outcome

label

A character vector of the outcome variable column name.

method.model

Column name of outcome variable (string), classification or regression. If the analysis goal is classification make the column a factor type. For regression, make outcome column numeric type.

is.simulated

A TRUE or FALSE character for data type

ncv_folds

A numeric vector to indicate nested cv folds: c(k_outer, k_inner)

param.tune

A TRUE or FALSE character for tuning parameters

learning_method

Name of the method: glmnet/xgbTree/rf

importance.algorithm

A character vestor containing a specific importance algorithm subtype

wrapper

feature selection algorithm including: rf, glmnet, t.test, centrality methods (PageRank, Katz, EpistasisRank, and EpistasisKatz from Rinbix packages), ReliefF family, and etc.

inner_selection_percent

= Percentage of features to be selected in each inner fold.

inner_selection_positivescores

A TRUE or FALSE character to select positive scores (if the value is False, use the percentage method).

tune.inner_selection_percent

A sequence vector of possible percentages for tuning

tune.k

A sequence vector to tune k nearest neighbors in relief method, if TRUE the default grid is seq(1,kmax), where kmax=floor((m-1)/2) and m is the number of samples. However, this kmax is for balanced data. If data are imbalance, where m_minority + m_majority = m, then kmax = floor(m_minority-1). Default is FALSE.

tuneGrid

A data frame with possible tuning values. The columns are named the same as the tuning parameters. This caret library parameter, for more information refer to http://topepo.github.io/caret/available-models.html.

relief.k.method

A character of numeric to indicate number of nearest neighbors for relief algorithm. Possible characters are: k_half_sigma (floor((num.samp-1)*0.154)), m6 (floor(num.samp/6)), myopic (floor((num.samp-1)/2)), and m4 (floor(num.samp/4))

num_tree

Number of trees in random forest and xgboost methods

verbose

A flag indicating whether verbose output be sent to stdout

Value

A list with:

cv.acc

Training data accuracy

Validation

Validation data accuracy

Features

number of variables detected correctly in nested cross validation

Train_model

Traing model to use for validation

Elapsed

total elapsed time

num.samples <- 100 num.variables <- 100 pct.signals <- 0.1 label <- "class" sim.data <- createSimulation(num.samples = num.samples, num.variables = num.variables, pct.signals = pct.signals, sim.type = "mainEffect", label = label, verbose = FALSE) cnCV.results <- consensus_nestedCV(train.ds = sim.data$train, validation.ds = sim.data$holdout, label = label, is.simulated = TRUE, ncv_folds = c(10, 10), param.tune = FALSE, learning_method = "rf", importance.algorithm = "ReliefFbestK", num_tree = 500, verbose = FALSE)


insilico/npdro documentation built on July 1, 2023, 2:56 p.m.