predByBS: Bootstrap resampling prediction via supervised machine...

View source: R/BioMM.R

predByBSR Documentation

Bootstrap resampling prediction via supervised machine learning with feature selection

Description

Prediction via supervised machine learning using bootstrap resampling along with feature selection methods.

Usage

predByBS(
  trainData,
  testData,
  dataMode,
  repeats,
  FSmethod,
  cutP,
  fdr,
  FScore = MulticoreParam(),
  classifier,
  predMode,
  paramlist,
  innerCore = MulticoreParam()
)

Arguments

trainData

The input training dataset. The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.

testData

The input test dataset. The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.

dataMode

The input training data mode for model training. It is used only if 'testData' is present. It can be a subset of the whole training data or the entire training data. 'subTrain' is the given for subsetting and 'allTrain' for the entire training dataset.

repeats

The number of repeats used for boostrapping.

FSmethod

Feature selection methods. Available options are c(NULL, 'positive', 'wilcox.test', 'cor.test', 'chisq.test', 'posWilcox', or 'top10pCor').

cutP

The cutoff used for p value thresholding. Commonly used cutoffs are c(0.5, 0.1, 0.05, 0.01, etc). The default is 0.05.

fdr

Multiple testing correction method. Available options are c(NULL, 'fdr', 'BH', 'holm', etc). See also p.adjust. The default is NULL. If FSmethod = "posTopCor", cutP is defined as the number of most correlated features with 'fdr' = NULL.

FScore

The number of cores used for feature selection if parallel computing needed.

classifier

Machine learning classifiers.

predMode

The prediction mode. Available options are c('probability', 'classification', 'regression').

paramlist

A set of model parameters defined in an R list object.

innerCore

The number of cores used for computation.

Value

The predicted output for the test data.

Examples

 
## Load data  
methylfile <- system.file('extdata', 'methylData.rds', package='BioMM')  
methylData <- readRDS(methylfile)  
dataY <- methylData[,1]
## select a subset of genome-wide methylation data at random
methylSub <- data.frame(label=dataY, methylData[,c(2:2001)])  
trainIndex <- sample(nrow(methylSub), 16)
trainData = methylSub[trainIndex,]
testData = methylSub[-trainIndex,]
library(ranger) 
library(BiocParallel)
param1 <- MulticoreParam(workers = 1)
param2 <- MulticoreParam(workers = 20)
predY <- predByBS(trainData, testData, 
                  dataMode='allTrain', repeats=50,
                  FSmethod=NULL, cutP=0.1, 
                  fdr=NULL, FScore=param1, 
                  classifier='randForest',
                  predMode='classification', 
                  paramlist=list(ntree=300, nthreads=10),
                  innerCore=param2)  
testY <- testData[,1]
accuracy <- classifiACC(dataY=testY, predY=predY)
print(accuracy)  

transbioZI/BioMM documentation built on Jan. 12, 2023, 2:18 p.m.