getDataByFilter: Return the data by feature filtering
In transbioZI/BioMMex: BioMM: Biological-informed Multi-stage Machine learning framework for phenotype prediction using omics data

getDataByFilter

R Documentation

Return the data by feature filtering

Description

Identify and select a subset of outcome-associated or predictive features in the training data based on filtering methods. Return the same set of selected features for the test data if it is available.

Usage

getDataByFilter(
  trainData,
  testData,
  FSmethod,
  cutP = 0.1,
  fdr = NULL,
  FScore = MulticoreParam()
)

Arguments

`trainData`	The input training dataset. The first column is the label.
`testData`	The input test dataset. The first column is the label.
`FSmethod`	Feature selection methods. Available options are c(NULL, 'positive', 'wilcox.test', 'cor.test', 'chisq.test', 'posWilcox', or 'top10pCor'). 'positive' is the positively outcome-associated features using the Pearson correlation method. 'posWilcox' is the positively outcome-associated features using Pearson correlation method together with 'wilcox.text' method. 'top10pCor' is the top 10 outcome-associcated features. This is helpful when no features can be picked during stringent feature selection procedure. 'posTopCor' selects the number of most correlated features.
`cutP`	The cutoff used for p value thresholding. It can be any value between 0 and 1. Commonly used cutoffs are c(0.5, 0.1, 0.05, 0.01, etc.). The default is 0.1. If FSmethod = "posTopCor", cutP is then defined as the number of most correlated features with 'fdr' = NULL.
`fdr`	Multiple testing correction method. Available options are c(NULL, 'fdr', 'BH', 'holm' etc). See also `p.adjust`. The default is NULL.
`FScore`	The number of cores used for some feature selection methods. If it's NULL, then no parallel computing is applied.

Details

Parallel computing is helpful if your input data is high dimensional. For 'cutP', a soft thresholding of 0.1 may be favorable than more stringent p value cutoff because the features with small effect size can be taken into consideration for downstream analysis. However, for high dimensional (e.g. p > 10,000) data, many false positive features may exist, thus, rigorous p value thresholding should be applied. The choice of feature selection method depends on the characteristics of the input data.

Value

Both training and test data (if provided) with pre-selected features are returned if feature selection method is applied. If no feature can be selected during feature selection procedure, then the output is NULL.

Author(s)

Junfang Chen

Examples

 
## Load data  
methylfile <- system.file('extdata', 'methylData.rds', package='BioMM')  
methylData <- readRDS(methylfile)  
trainIndex <- sample(nrow(methylData), 20)
trainData = methylData[trainIndex,]
testData = methylData[-trainIndex,]
## Feature selection
library(BiocParallel)
param <- MulticoreParam(workers = 10)
## Select outcome-associated features based on the Wilcoxon test (P<0.1)
datalist <- getDataByFilter(trainData, testData, FSmethod="wilcox.test", 
                           cutP=0.1, fdr=NULL, FScore=param)
trainDataSub <- datalist[[1]] 
testDataSub <- datalist[[2]] 
print(dim(trainData))
print(dim(trainDataSub))

transbioZI/BioMMex documentation built on Jan. 27, 2023, 4:14 a.m.

transbioZI/BioMMex index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

transbioZI/BioMMex
BioMM: Biological-informed Multi-stage Machine learning framework for phenotype prediction using omics data

getDataByFilter: Return the data by feature filtering
In transbioZI/BioMMex: BioMM: Biological-informed Multi-stage Machine learning framework for phenotype prediction using omics data

Return the data by feature filtering

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to getDataByFilter in transbioZI/BioMMex...

R Package Documentation

Browse R Packages

We want your feedback!

transbioZI/BioMMex BioMM: Biological-informed Multi-stage Machine learning framework for phenotype prediction using omics data

getDataByFilter: Return the data by feature filtering In transbioZI/BioMMex: BioMM: Biological-informed Multi-stage Machine learning framework for phenotype prediction using omics data

Return the data by feature filtering

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to getDataByFilter in transbioZI/BioMMex...

R Package Documentation

Browse R Packages

We want your feedback!

transbioZI/BioMMex
BioMM: Biological-informed Multi-stage Machine learning framework for phenotype prediction using omics data

getDataByFilter: Return the data by feature filtering
In transbioZI/BioMMex: BioMM: Biological-informed Multi-stage Machine learning framework for phenotype prediction using omics data