stab.fs: Function to quantify stability of feature selection.

Description

This function computes several indexes to quantify feature selection stability. This is usually estimated through perturbation of the original dataset by generating multiple sets of selected features.

Usage

1
stab.fs(fsets, N, method = c("kuncheva", "davis"), ...)

Arguments

fsets

list of sets of selected features, each set of selected features may have different size

N

total number of features on which feature selection is performed

method

stability index (see details section)

...

additional parameters passed to stability index (penalty that is a numeric for Davis' stability index, see details section)

Details

Stability indices may use different parameters. In this version only the Davis index requires an additional parameter that is penalty, a numeric value used as penalty term.

Kuncheva index (kuncheva) lays in [-1, 1], An index of -1 means no intersection between sets of selected features, +1 means that all the same features are always selected and 0 is the expected stability of a random feature selection.

Davis index (davis) lays in [0,1], With a pnalty term equal to 0, an index of 0 means no intersection between sets of selected features and +1 means that all the same features are always selected. A penalty of 1 is usually used so that a feature selection performed with no or all features has a Davis stability index equals to 0. None estimate of the expected Davis stability index of a random feature selection was published.

Value

A numeric that is the stability index

Author(s)

Benjamin Haibe-Kains

References

Davis CA, Gerick F, Hintermair V, Friedel CC, Fundel K, Kuffner R, Zimmer R (2006) "Reliable gene signatures for microarray classification: assessment of stability and performance", Bioinformatics, 22(19):356-2363.

Kuncheva LI (2007) "A stability index for feature selection", AIAP'07: Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference, pages 390–395.

See Also

stab.fs.ranking

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
set.seed(54321)
## 100 random selection of 50 features from a set of 10,000 features
fsets <- lapply(as.list(1:100), function(x, size=50, N=10000) {
  return(sample(1:N, size, replace=FALSE))} )
names(fsets) <- paste("fsel", 1:length(fsets), sep=".")

## Kuncheva index
stab.fs(fsets=fsets, N=10000, method="kuncheva")
## close to 0 as expected for a random feature selection

## Davis index
stab.fs(fsets=fsets, N=10000, method="davis", penalty=1)

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.