combineFS: combineFS
In enriquea/feseR: Feature Selection workflow in R

combineFS

R Documentation

combineFS

Description

The main function controlling the Feature Selection workflow. This function combines sequencially different FS step keeping the following structure: Univariate filter -> Multivariate filter -> Wrapper method. It includes an internal cross-validation step used in the last FS step. In addition, it is posible to set up an external loop which operates over randomized and class-balanced test data. The process returns information from both training and testing phases.

Usage

combineFS(
  features,
  class,
  univariate = "corr",
  mincorr = 0.3,
  n.percent = 0.75,
  zero.gain.out = TRUE,
  multivariate = "mcorr",
  maxcorr = 0.75,
  cum.var.cutoff = 1,
  wrapper = "rfe.rf",
  number.cv = 10,
  group.sizes = c(1:10, seq(15, 100, 5)),
  extfolds = 10,
  partition = 2/3,
  metric = "Accuracy",
  tolerance = 0,
  verbose = TRUE
)

Arguments

`features`	A numeric matrix as input.
`class`	Response variable as numeric vector. It will be coerced to factor.
`univariate`	Descrition of the Univariate filter to be used. Set 'corr' (default) for correlation filter, 'gain' for gain information or none.
`mincorr`	The threshold controling the Univariate correlation filter.
`n.percent`	If 'gain' is selected, this parameter controls the percent of features (with higher) to be returned.
`zero.gain.out`	Is TRUE (default), zero-gain features will be filtered out (n.percent will be ignored).
`multivariate`	Multivariate filter to be used. Set 'mcorr' (default) for correlation filter, 'pca' for Principal Component Analysis or none.
`maxcorr`	The threshold controling the matrix correlation filter (default value 0.75).
`cum.var.cutoff`	If 'pca' is selected, this parameter controls the PCA process. See function `filter.pca`.
`wrapper`	Wrapper method to be used. Set 'rfe.rf' (default) for recursive feature elimination wrapped with random forest.
`number.cv`	See `rfeRF` for description.
`group.sizes`	See `rfeRF` for description.
`extfolds`	Number of times (default 10) to repeat the entire FS process randomizing the dataset (test/training).
`partition`	Parameter controling the data partition in test and training dataset. It generates random and class-balanced dataset.
`metric`	Metric to evaluate performance ('Accuracy' (default), 'Kappa' or 'ROC').
`tolerance`	Allow tolerance for evaluation metric (Default zero).
`verbose`	Make the output verbose.

Value

A list with the following elements.

opt.variables vector with optimal (final) features names. training dataframe with the metrics from the training phase. testing dataframe with the metrics from the testing phase. best.model the best model obtained (max. accuracy and min. number of final features). runtime the workflow runtime (secs).

enriquea/feseR documentation built on Feb. 25, 2025, 12:20 a.m.