Description Usage Arguments Details Value Author(s) Examples
Reconstruct stage-2 data by supervised machine learning prediction.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | reconBySupervised(
trainDataList,
testDataList,
resample = "BS",
dataMode,
repeatA,
repeatB,
nfolds,
FSmethod,
cutP,
fdr,
FScore = MulticoreParam(),
classifier,
predMode,
paramlist,
innerCore = MulticoreParam(),
outFileA = NULL,
outFileB = NULL
)
|
trainDataList |
The input training data list containing ordered collections of matrices. |
testDataList |
The input test data list containing ordered collections of matrices. |
resample |
The resampling methods. Valid options are 'CV' and 'BS'. 'CV' for cross validation and 'BS' for bootstrapping resampling. The default is 'BS'. |
dataMode |
The mode of data used. 'subTrain' or 'allTrain'. |
repeatA |
The number of repeats N is used during resampling procedure. Repeated cross validation or multiple boostrapping is performed if N >=2. One can choose 10 repeats for 'CV' and 100 repeats for 'BS'. |
repeatB |
The number of repeats N is used for generating test data prediction scores. |
nfolds |
The number of folds is defined for cross validation. |
FSmethod |
Feature selection methods. Available options are c(NULL, 'positive', 'wilcox.test', 'cor.test', 'chisq.test', 'posWilcox', or 'top10pCor'). |
cutP |
The cutoff used for p value thresholding. Commonly used cutoffs are c(0.5, 0.1, 0.05, 0.01, etc). The default is 0.05. |
fdr |
Multiple testing correction method. Available options are
c(NULL, 'fdr', 'BH', 'holm', etc).
See also |
FScore |
The number of cores used for feature selection, if parallel computing needed. |
classifier |
Machine learning classifiers. |
predMode |
The prediction mode. Available options are c('probability', 'classification', 'regression'). |
paramlist |
A set of model parameters defined in an R list object. |
innerCore |
The number of cores used for computation. |
outFileA |
The file name of stage-2 training data with the '.rds' file extension. If it's provided, then the result will be saved in this file. The default is NULL. |
outFileB |
The file name of stage-2 training data with the '.rds' file extension. If it's provided, then the result will be saved in this file. The default is NULL. |
Stage-2 training data can be learned either using bootstrapping or cross validation resampling methods. Stage-2 test data is learned via independent test set prediction.
The predicted stage-2 training data and also stage-2 test data, if 'testDataList' provided. If outFileA and outFileB are provided, then the results will be stored in the files.
Junfang Chen
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
## Load data
methylfile <- system.file('extdata', 'methylData.rds', package='BioMM')
methylData <- readRDS(methylfile)
## Annotation file
probeAnnoFile <- system.file('extdata', 'cpgAnno.rds', package='BioMM')
featureAnno <- readRDS(file=probeAnnoFile)
## Mapping CpGs into Pathways
featureAnno <- readRDS(system.file("extdata", "cpgAnno.rds", package="BioMM"))
pathlistDB <- readRDS(system.file("extdata", "goDB.rds", package="BioMM"))
head(featureAnno)
dataList <- omics2pathlist(data=methylData, pathlistDB, featureAnno,
restrictUp=100, restrictDown=10, minPathSize=10)
length(dataList)
library(ranger)
library(BiocParallel)
param1 <- MulticoreParam(workers = 1)
param2 <- MulticoreParam(workers = 20)
## Not Run, this will take a bit long
## stage2data <- reconBySupervised(trainDataList=dataList, testDataList=NULL,
## resample='CV', dataMode='allTrain',
## repeatA=50, repeatB=20, nfolds=10,
## FSmethod=NULL, cutP=0.1,
## fdr=NULL, FScore=param1,
## classifier='randForest',
## predMode='classification',
## paramlist=list(ntree=500, nthreads=20),
## innerCore=param2, outFileA=NULL, outFileB=NULL)
## print(dim(stage2data))
## print(head(stage2data[,1:5]))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.