BioMM: BioMM end-to-end prediction
In BioMM: BioMM: Biological-informed Multi-stage Machine learning framework for phenotype prediction using omics data

Description Usage Arguments Details Value References See Also Examples

The BioMM framework uses two-stage machine learning models that can allow us to integrate prior biological knowledge for end-to-end phenotype prediction.

BioMM(
  trainData,
  testData,
  pathlistDB,
  featureAnno,
  restrictUp,
  restrictDown,
  minPathSize,
  supervisedStage1 = TRUE,
  typePCA,
  resample1 = "BS",
  resample2 = "CV",
  dataMode = "allTrain",
  repeatA1 = 100,
  repeatA2 = 1,
  repeatB1 = 20,
  repeatB2 = 1,
  nfolds = 10,
  FSmethod1,
  FSmethod2,
  cutP1,
  cutP2,
  fdr2,
  FScore = MulticoreParam(),
  classifier,
  predMode,
  paramlist,
  innerCore = MulticoreParam()
)

`trainData`	The input training dataset. The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.
`testData`	The input test dataset. The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.
`pathlistDB`	A list of pathways with pathway IDs and their corresponding genes ('entrezID' is used). This is only used for pathway-based stratification (only `stratify` is 'pathway').
`featureAnno`	The annotation data stored in a data.frame for probe mapping. It must have at least two columns named 'ID' and 'entrezID'. If it's NULL, then the input probe is from the transcriptomic data. (Default: NULL)
`restrictUp`	The upper-bound of the number of probes or genes in each biological stratified block.
`restrictDown`	The lower-bound of the number of probes or genes in each biological stratified block.
`minPathSize`	The minimal defined pathway size after mapping your own data to GO database. This is only used for pathway-based stratification (only `stratify` is 'pathway').
`supervisedStage1`	A logical value. If TRUE, then supervised learning models are applied; if FALSE, unsupervised learning.
`typePCA`	the type of PCA. Available options are c('regular', 'sparse').
`resample1`	The resampling methods at stage-1. Valid options are 'CV' and 'BS'. 'CV' for cross validation and 'BS' for bootstrapping resampling. The default is 'BS'.
`resample2`	The resampling methods at stage-2. Valid options are 'CV' and 'BS'. 'CV' for cross validation and 'BS' for bootstrapping resampling. The default is 'CV'.
`dataMode`	The input training data mode for model training. It is used only if 'testData' is present. It can be a subset of the whole training data or the entire training data. 'subTrain' is the given for subsetting and 'allTrain' for the entire training dataset.
`repeatA1`	The number of repeats N is used during resampling procedure. Repeated cross validation or multiple boostrapping is performed if N >=2. One can choose 10 repeats for 'CV' and 100 repeats for 'BS'.
`repeatA2`	The number of repeats N is used during resampling prediction. The default is 1 for 'CV'.
`repeatB1`	The number of repeats N is used for generating stage-2 test data prediction scores. The default is 20.
`repeatB2`	The number of repeats N is used for test data prediction. The default is 1.
`nfolds`	The number of folds is defined for cross validation. The default is 10.
`FSmethod1`	Feature selection methods at stage-1. Available options are c(NULL, 'positive', 'wilcox.test', 'cor.test', 'chisq.test', 'posWilcox').
`FSmethod2`	Feature selection methods at stage-2. Features that are positively associated with the outcome will be used.
`cutP1`	The cutoff used for p value thresholding at stage-1. Commonly used cutoffs are c(0.5, 0.1, 0.05, 0.01, etc). If "FSmethod1" is NULL, Then no cutoff is applied.
`cutP2`	The cutoff used for p value thresholding at stage-2. Commonly used cutoffs are c(0.5, 0.1, 0.05, 0.01, etc). If "FSmethod2" is NULL, Then no cutoff is applied.
`fdr2`	Multiple testing correction method at stage-2. Available options are c(NULL, 'fdr', 'BH', 'holm', etc). See also `p.adjust`. The default is NULL. This option is useful particularly when large sets of pathways are investigated.
`FScore`	The number of cores used for feature selection.
`classifier`	Machine learning classifiers at both stages. Available options are c('randForest', 'SVM', 'glmnet').
`predMode`	The prediction mode at both stages. Available options are c('probability', 'classification', 'regression').
`paramlist`	A list of model parameters at both stages. The set of parameters are different for each classifier. Please see the detailed parameters are implemented for each individual classifier, e.g., 'baseRandForest()', 'baseSVM()', and 'baseGLMnet()'.
`innerCore`	The number of cores used for computation. It needs to be reconciled with "FScore" depending on the number of cores available.

Stage-2 training data can be learned either using bootstrapping or cross validation resampling methods in the supervised learning settting. Stage-2 test data is learned via independent test set prediction.

The CV or BS predicted score for the training data and test set predicted score if testData is given.

Chen, J., & Schwarz, E. (2017). BioMM: Biologically-informed Multi-stage Machine learning for identification of epigenetic fingerprints. arXiv preprint arXiv:1712.00336.

Perlich, C., & Swirszcz, G. (2011). On cross-validation and stacking: Building seemingly predictive models on random data. ACM SIGKDD Explorations Newsletter, 12(2), 11-15.

reconBySupervised; reconByUnsupervised; BioMMstage2pred

 
## Load data    
methylfile <- system.file('extdata', 'methylData.rds', package='BioMM')  
methylData <- readRDS(methylfile)    
testData <- NULL
## Annotation file
probeAnnoFile <- system.file('extdata', 'cpgAnno.rds', package='BioMM')  
probeAnno <- readRDS(file=probeAnnoFile)     
golist <- readRDS(system.file("extdata", "goDB.rds", package="BioMM")) 
pathlistDB <- golist[1:100]
supervisedStage1=TRUE
classifier <- 'randForest'
predMode <- 'classification'
paramlist <- list(ntree=300, nthreads=30)   
library(BiocParallel)
library(ranger)
param1 <- MulticoreParam(workers = 2)
param2 <- MulticoreParam(workers = 20)
## Not Run 
## result <- BioMM(trainData=methylData, testData=NULL,
##                 pathlistDB, featureAnno=probeAnno, 
##                 restrictUp=200, restrictDown=10, minPathSize=10, 
##                 supervisedStage1, typePCA='regular', 
##                 resample1='BS', resample2='CV', dataMode="allTrain",
##                 repeatA1=20, repeatA2=1, repeatB1=20, repeatB2=1, 
##                 nfolds=10, FSmethod1=NULL, FSmethod2=NULL, 
##                 cutP1=0.1, cutP2=0.1, fdr2=NULL, FScore=param1, 
##                 classifier, predMode, paramlist, innerCore=param2)
## if (is.null(testData)) {
##     predY <- result 
##     trainDataY <- methylData[,1]
##     metricCV <- getMetrics(dataY = trainDataY, predY)
##     message("Cross-validation prediction performance:")
##     print(metricCV)
## } else if (!is.null(testData)){
##     trainDataY <- methylData[,1]
##     testDataY <- testData[,1]
##     cvYscore <- result[[1]]
##     testYscore <- result[[2]] 
##     metricCV <- getMetrics(dataY = trainDataY, cvYscore)
##     metricTest <- getMetrics(dataY = testDataY, testYscore)
##     message("Cross-validation performance:")
##     print(metricCV)
##     message("Test set prediction performance:")
##     print(metricTest)
## }

BioMM documentation built on Nov. 8, 2020, 11:04 p.m.

BioMM index

README.md BioMM: Biological-informed Multi-stage Machine learning framework for phenotype prediction using omics data

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

BioMM
BioMM: Biological-informed Multi-stage Machine learning framework for phenotype prediction using omics data

BioMM: BioMM end-to-end prediction
In BioMM: BioMM: Biological-informed Multi-stage Machine learning framework for phenotype prediction using omics data

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to BioMM in BioMM...

R Package Documentation

Browse R Packages

We want your feedback!

BioMM BioMM: Biological-informed Multi-stage Machine learning framework for phenotype prediction using omics data

BioMM: BioMM end-to-end prediction In BioMM: BioMM: Biological-informed Multi-stage Machine learning framework for phenotype prediction using omics data

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to BioMM in BioMM...

R Package Documentation

Browse R Packages

We want your feedback!

BioMM
BioMM: Biological-informed Multi-stage Machine learning framework for phenotype prediction using omics data

BioMM: BioMM end-to-end prediction
In BioMM: BioMM: Biological-informed Multi-stage Machine learning framework for phenotype prediction using omics data