reconBySupervised: Reconstruct stage-2 data by supervised machine learning...

Description Usage Arguments Details Value Author(s) Examples

View source: R/BioMM.R

Description

Reconstruct stage-2 data by supervised machine learning prediction.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
reconBySupervised(
  trainDataList,
  testDataList,
  resample = "BS",
  dataMode,
  repeatA,
  repeatB,
  nfolds,
  FSmethod,
  cutP,
  fdr,
  FScore = MulticoreParam(),
  classifier,
  predMode,
  paramlist,
  innerCore = MulticoreParam(),
  outFileA = NULL,
  outFileB = NULL
)

Arguments

trainDataList

The input training data list containing ordered collections of matrices.

testDataList

The input test data list containing ordered collections of matrices.

resample

The resampling methods. Valid options are 'CV' and 'BS'. 'CV' for cross validation and 'BS' for bootstrapping resampling. The default is 'BS'.

dataMode

The mode of data used. 'subTrain' or 'allTrain'.

repeatA

The number of repeats N is used during resampling procedure. Repeated cross validation or multiple boostrapping is performed if N >=2. One can choose 10 repeats for 'CV' and 100 repeats for 'BS'.

repeatB

The number of repeats N is used for generating test data prediction scores.

nfolds

The number of folds is defined for cross validation.

FSmethod

Feature selection methods. Available options are c(NULL, 'positive', 'wilcox.test', 'cor.test', 'chisq.test', 'posWilcox', or 'top10pCor').

cutP

The cutoff used for p value thresholding. Commonly used cutoffs are c(0.5, 0.1, 0.05, 0.01, etc). The default is 0.05.

fdr

Multiple testing correction method. Available options are c(NULL, 'fdr', 'BH', 'holm', etc). See also p.adjust. The default is NULL.

FScore

The number of cores used for feature selection, if parallel computing needed.

classifier

Machine learning classifiers.

predMode

The prediction mode. Available options are c('probability', 'classification', 'regression').

paramlist

A set of model parameters defined in an R list object.

innerCore

The number of cores used for computation.

outFileA

The file name of stage-2 training data with the '.rds' file extension. If it's provided, then the result will be saved in this file. The default is NULL.

outFileB

The file name of stage-2 training data with the '.rds' file extension. If it's provided, then the result will be saved in this file. The default is NULL.

Details

Stage-2 training data can be learned either using bootstrapping or cross validation resampling methods. Stage-2 test data is learned via independent test set prediction.

Value

The predicted stage-2 training data and also stage-2 test data, if 'testDataList' provided. If outFileA and outFileB are provided, then the results will be stored in the files.

Author(s)

Junfang Chen

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
 
## Load data  
methylfile <- system.file('extdata', 'methylData.rds', package='BioMM')  
methylData <- readRDS(methylfile)  
## Annotation file
probeAnnoFile <- system.file('extdata', 'cpgAnno.rds', package='BioMM')  
featureAnno <- readRDS(file=probeAnnoFile)  
## Mapping CpGs into Pathways
featureAnno <- readRDS(system.file("extdata", "cpgAnno.rds", package="BioMM")) 
pathlistDB <- readRDS(system.file("extdata", "goDB.rds", package="BioMM")) 
head(featureAnno)   
dataList <- omics2pathlist(data=methylData, pathlistDB, featureAnno, 
                           restrictUp=100, restrictDown=10, minPathSize=10) 
length(dataList)
library(ranger) 
library(BiocParallel)
param1 <- MulticoreParam(workers = 1)
param2 <- MulticoreParam(workers = 20)
## Not Run, this will take a bit long
## stage2data <- reconBySupervised(trainDataList=dataList, testDataList=NULL, 
##                             resample='CV', dataMode='allTrain', 
##                             repeatA=50, repeatB=20, nfolds=10, 
##                             FSmethod=NULL, cutP=0.1, 
##                             fdr=NULL, FScore=param1, 
##                             classifier='randForest',
##                             predMode='classification', 
##                             paramlist=list(ntree=500, nthreads=20),
##                             innerCore=param2, outFileA=NULL, outFileB=NULL) 
## print(dim(stage2data))
## print(head(stage2data[,1:5]))

BioMM documentation built on Nov. 8, 2020, 11:04 p.m.