predByFS: Prediction by supervised machine learning along with feature...

View source: R/BioMM.R

predByFSR Documentation

Prediction by supervised machine learning along with feature selection

Description

Prediction by supervised machine learning along with feature selection.

Usage

predByFS(
  trainData,
  testData,
  FSmethod,
  cutP,
  fdr,
  FScore = MulticoreParam(),
  classifier,
  predMode,
  paramlist
)

Arguments

trainData

The input training dataset. The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.

testData

The input test dataset. The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.

FSmethod

Feature selection methods. Available options are c(NULL, 'positive', 'wilcox.test', 'cor.test', 'chisq.test', 'posWilcox', or 'top10pCor').

cutP

The cutoff used for p value thresholding. Commonly used cutoffs are c(0.5, 0.1, 0.05, etc.). The default is 0.05. If FSmethod = "posTopCor", cutP is defined as the number of most correlated features with 'fdr' = NULL.

fdr

Multiple testing correction method. Available options are c(NULL, 'fdr', 'BH', 'holm', etc.). See also p.adjust. The default is NULL.

FScore

The number of cores used for feature selection.

classifier

Machine learning classifiers. Available options are c('randForest', 'SVM', 'glmnet').

predMode

The prediction mode. Available options are c('probability', 'classification', 'regression').

paramlist

A set of model parameters defined in an R list object.

Details

If no feature selected or just one selected feature, then top 10

Value

The predicted output for the test data.

Author(s)

Junfang Chen

See Also

getDataByFilter

Examples

 
## Load data  
methylfile <- system.file('extdata', 'methylData.rds', package='BioMM')  
methylData <- readRDS(methylfile)  
dataY <- methylData[,1]
## select a subset of genome-wide methylation data at random
methylSub <- data.frame(label=dataY, methylData[,c(2:501)])  
trainIndex <- sample(nrow(methylSub), 16)
trainData = methylSub[trainIndex,]
testData = methylSub[-trainIndex,]
library(ranger) 
library(BiocParallel)
param <- MulticoreParam(workers = 10)
predY <- predByFS(trainData, testData, 
                  FSmethod='cor.test', cutP=0.1, 
                  fdr=NULL, FScore=param, 
                  classifier='randForest',
                  predMode='classification', 
                  paramlist=list(ntree=300, nthreads=20))  
testY <- testData[,1]
accuracy <- classifiACC(dataY=testY, predY=predY)
print(accuracy)  

transbioZI/BioMM documentation built on Jan. 12, 2023, 2:18 p.m.