predByCV: Cross validation prediction by supervised machine learning...

Description Usage Arguments Value Examples

View source: R/BioMM.R

Description

Prediction by supervised machine learning models using cross validation along with feature selection methods.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
predByCV(
  data,
  repeats,
  nfolds,
  FSmethod,
  cutP,
  fdr,
  FScore = MulticoreParam(),
  classifier,
  predMode,
  paramlist,
  innerCore = MulticoreParam()
)

Arguments

data

The input dataset. The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.

repeats

The number of repeats used for cross validation. Repeated cross validation is performed if N >=2.

nfolds

The number of folds is defined for cross validation.

FSmethod

Feature selection methods. Available options are c(NULL, 'positive', 'wilcox.test', 'cor.test', 'chisq.test', 'posWilcox', or 'top10pCor').

cutP

The cutoff used for p value thresholding. Commonly used cutoffs are c(0.5, 0.1, 0.05, 0.01, etc). The default is 0.05.

fdr

Multiple testing correction method. Available options are c(NULL, 'fdr', 'BH', 'holm', etc). See also p.adjust. The default is NULL.

FScore

The number of cores used for feature selection if parallel computing needed.

classifier

Machine learning classifiers.

predMode

The prediction mode. Available options are c('probability', 'classification', 'regression').

paramlist

A set of model parameters defined in an R list object.

innerCore

The number of cores used for computation.

Value

The predicted cross validation output.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
 
## Load data  
methylfile <- system.file('extdata', 'methylData.rds', package='BioMM')  
methylData <- readRDS(methylfile)   
dataY <- methylData[,1]
## select a subset of genome-wide methylation data at random
methylSub <- data.frame(label=dataY, methylData[,c(2:2001)])  
library(ranger) 
library(BiocParallel)
param1 <- MulticoreParam(workers = 1)
param2 <- MulticoreParam(workers = 20)
predY <- predByCV(methylSub, repeats=1, nfolds=10,   
                  FSmethod=NULL, cutP=0.1, 
                  fdr=NULL, FScore=param1, 
                  classifier='randForest',
                  predMode='classification', 
                  paramlist=list(ntree=300, nthreads=1),
                  innerCore=param2)
dataY <- methylData[,1]
accuracy <- classifiACC(dataY=dataY, predY=predY)
print(accuracy)  

BioMM documentation built on Nov. 8, 2020, 11:04 p.m.