View source: R/stability_functions_corrected.R
| stabilityLustgarten | R Documentation | 
The stability of feature selection is defined as the robustness of the sets of selected features with respect to small variations in the data on which the feature selection is conducted. To quantify stability, several datasets from the same data generating process can be used. Alternatively, a single dataset can be split into parts by resampling. Either way, all datasets used for feature selection must contain exactly the same features. The feature selection method of interest is applied on all of the datasets and the sets of chosen features are recorded. The stability of the feature selection is assessed based on the sets of chosen features using stability measures.
stabilityLustgarten(features, p, impute.na = NULL)
| features | 
 | 
| p | 
 | 
| impute.na | 
 | 
The stability measure is defined as (see Notation)
\frac{2}{m (m - 1)} \sum_{i=1}^{m-1} \sum_{j = i+1}^m
\frac{|V_i \cap V_j| - \frac{|V_i| \cdot |V_j|}{p}}
{\min \{|V_i|, |V_j|\} - \max \{ 0, |V_i| + |V_j| - p \}}.
numeric(1) Stability value.
For the definition of all stability measures in this package,
the following notation is used:
Let V_1, \ldots, V_m denote the sets of chosen features
for the m datasets, i.e. features has length m and
V_i is a set which contains the i-th entry of features.
Furthermore, let h_j denote the number of sets that contain feature
X_j so that h_j is the absolute frequency with which feature X_j
is chosen.
Analogously, let h_{ij} denote the number of sets that include both X_i and X_j.
Also, let q = \sum_{j=1}^p h_j = \sum_{i=1}^m |V_i| and V = \bigcup_{i=1}^m V_i.
Lustgarten, L J, Gopalakrishnan, Vanathi, Visweswaran, Shyam (2009). “Measuring stability of feature selection in biomedical datasets.” In AMIA annual symposium proceedings, volume 2009, 406. American Medical Informatics Association.
Bommert A, Rahnenführer J, Lang M (2017). “A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data.” Computational and Mathematical Methods in Medicine, 2017, 1–18. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1155/2017/7907163")}.
Bommert A (2020). Integration of Feature Selection Stability in Model Fitting. Ph.D. thesis, TU Dortmund University, Germany. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.17877/DE290R-21906")}.
listStabilityMeasures
feats = list(1:3, 1:4, 1:5)
stabilityLustgarten(features = feats, p = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.