BioMMstage2pred: Prediction performance for stage-2 data using supervised...

View source: R/BioMM.R

BioMMstage2predR Documentation

Prediction performance for stage-2 data using supervised machine learning

Description

Prediction performance for reconstructed stage-2 data using supervised machine learning with feature selection methods.

Usage

BioMMstage2pred(
  trainData,
  testData,
  dataMode,
  repeatA = 1,
  repeatB = 1,
  nfolds,
  FSmethod,
  cutP,
  fdr,
  FScore = MulticoreParam(),
  classifier,
  predMode,
  paramlist,
  innerCore = MulticoreParam()
)

Arguments

trainData

The input training dataset (stage-2 data). The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.

testData

The input test dataset (stage-2 data). The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.

dataMode

The mode of data used. 'subTrain' or 'allTrain'.

repeatA

The number of repeats N is used during resampling prediction. The default is 1.

repeatB

The number of repeats N is used for test data prediction. The default is 1.

nfolds

The number of folds is defined for cross validation.

FSmethod

Feature selection methods. Available options are c(NULL, 'positive', 'wilcox.test', 'cor.test', 'chisq.test', 'posWilcox', 'top10pCor', 'posTopCor').

cutP

The cutoff used for p value thresholding. Commonly used cutoffs are c(0.5, 0.1, 0.05, 0.01, etc). The default is 0.05. If FSmethod = "posTopCor", cutP is then defined as the number of most correlated features with 'fdr' = NULL.

fdr

Multiple testing correction method. Available options are c(NULL, 'fdr', 'BH', 'holm', etc). See also p.adjust. The default is NULL.

FScore

The number of cores used for feature selection if parallel computing needed.

classifier

Machine learning classifiers.

predMode

The prediction mode. Available options are c('probability', 'classification', 'regression').

paramlist

A set of model parameters defined in an R list object.

innerCore

The number of cores used for computation.

Details

Stage-2 prediction is performed typically using positively correlated features. Since negative associations likely reflect random effects in the underlying data

Value

The CV or BS predicted score for stage-2 training data and test set predicted score for stage-2 test data if the test set is given.

Author(s)

Junfang Chen

References

Perlich, C., & Swirszcz, G. (2011). On cross-validation and stacking: Building seemingly predictive models on random data. ACM SIGKDD Explorations Newsletter, 12(2), 11-15.


transbioZI/BioMMex documentation built on Jan. 27, 2023, 4:14 a.m.