combineFS: combineFS

Description Usage Arguments Value

View source: R/fs_functions.R

Description

The main function controlling the Feature Selection workflow. This function combines sequencially different FS step keeping the following structure: Univariate filter -> Multivariate filter -> Wrapper method. It includes an internal cross-validation step used in the last FS step. In addition, it is posible to set up an external loop which operates over randomized and class-balanced test data. The process returns information from both training and testing phases.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
combineFS(
  features,
  class,
  univariate = "corr",
  mincorr = 0.3,
  n.percent = 0.75,
  zero.gain.out = TRUE,
  multivariate = "mcorr",
  maxcorr = 0.75,
  cum.var.cutoff = 1,
  wrapper = "rfe.rf",
  number.cv = 10,
  group.sizes = c(1:10, seq(15, 100, 5)),
  extfolds = 10,
  partition = 2/3,
  metric = "Accuracy",
  tolerance = 0,
  verbose = TRUE
)

Arguments

features

A numeric matrix as input.

class

Response variable as numeric vector. It will be coerced to factor.

univariate

Descrition of the Univariate filter to be used. Set 'corr' (default) for correlation filter, 'gain' for gain information or none.

mincorr

The threshold controling the Univariate correlation filter.

n.percent

If 'gain' is selected, this parameter controls the percent of features (with higher) to be returned.

zero.gain.out

Is TRUE (default), zero-gain features will be filtered out (n.percent will be ignored).

multivariate

Multivariate filter to be used. Set 'mcorr' (default) for correlation filter, 'pca' for Principal Component Analysis or none.

maxcorr

The threshold controling the matrix correlation filter (default value 0.75).

cum.var.cutoff

If 'pca' is selected, this parameter controls the PCA process. See function filter.pca.

wrapper

Wrapper method to be used. Set 'rfe.rf' (default) for recursive feature elimination wrapped with random forest.

number.cv

See rfeRF for description.

group.sizes

See rfeRF for description.

extfolds

Number of times (default 10) to repeat the entire FS process randomizing the dataset (test/training).

partition

Parameter controling the data partition in test and training dataset. It generates random and class-balanced dataset.

metric

Metric to evaluate performance ('Accuracy' (default), 'Kappa' or 'ROC').

tolerance

Allow tolerance for evaluation metric (Default zero).

verbose

Make the output verbose.

Value

A list with the following elements.

opt.variables vector with optimal (final) features names. training dataframe with the metrics from the training phase. testing dataframe with the metrics from the testing phase. best.model the best model obtained (max. accuracy and min. number of final features). runtime the workflow runtime (secs).


enriquea/feseR documentation built on March 30, 2021, 4:12 p.m.