ensembleFS: Run end-to-end Ensemble for comparison of feature selection...

View source: R/main.R

ensembleFSR Documentation

Run end-to-end Ensemble for comparison of feature selection methods.

Description

Run end-to-end Ensemble for comparison of feature selection methods.

Usage

ensembleFS(
  x,
  y,
  methods = c("fs.utest"),
  method.cv = "kfoldcv",
  params.cv = list(k = 3, niter = 5),
  level.cor = 1,
  params = list(adjust = "holm", feature.number = 10, alpha = 0.05, use.cuda = FALSE,
    cutoff.method = c("kmeans")),
  asm = c("fs.utest"),
  model = c("fs.utest")
)

Arguments

x

input data where columns are variables and rows are observations (all numeric)

y

decision variable as a boolean vector of length equal to number of observations

methods

A vector with feature selection methods available in this library for comparison

method.cv

validation method kfoldcv for cross-validation k-fold or rsampling for random sampling

params.cv

A list with the following fields:

  • k – the number of groups that a given data sample is to be split into, not less than 3

  • test.size – testing set size for random sampling validation

  • iter – the number of validation repetitions

level.cor

cutoff level of correlated variables. If equal to 1 is not performed

params

A list with the following fields:

  • adjust – method as accepted by p.adjust ("BY" is recommended for FDR, see Details) for MDFS1D, MDFS2D and U-test

  • feature.number – number of attributes to select. Must not exceed ncol(x)

  • alpha – significance level for MDFS1D, MDFS2D and U-test

  • use.cuda – whether to use CUDA acceleration (must be compiled with CUDA) for MDFS2D method

  • cutoff.method – cutoff method MCFS: "permutations", "criticalAngle", "kmeans", "mean", "contrast"

asm

A vector with enumeration method for which to calculate Lustgarten’s stability measure

model

A vector with enumeration method for which to training and testing model RandomForest

Details

Ensemble for comparison of feature selection methods dedicated to high-throughput sequencing data.

Value

  • selected.feature – A list with the result of feature selection for the selected feature selection method

  • ranking.feature – A list with the result of the rating of the variables that were most often performed in each iteration of cross-validation

  • stability – A data.frame with the result of stability of selection of feature for the selected selection method

  • model – A data.frame with the result of constructing a random forest model for the selected feature selection method

Examples

## Not run: 

decisions <- data$class
data$class <- NULL

ensembleFS(data,
           decisions,
           methods = c('fs.utest', 'fs.mrmr'),
           method.cv = 'kfoldcv',
           params.cv = list(k = 3, iter = 10),
           level.cor = 0.75,
           params = list(adjust = 'fdr', feature.number = 10, alpha = 0.05),
           asm = c('fs.utest', 'fs.mrmr'),
           model = c('fs.utest', 'fs.mrmr')
           )


## End(Not run)


biocsuwb/EnsembleFS-package documentation built on Dec. 9, 2024, 5:32 p.m.