dot-parse_variable_importance_settings: Internal function for parsing settings related to variable...

.parse_variable_importance_settingsR Documentation

Internal function for parsing settings related to variable importance computation.

Description

Internal function for parsing settings related to variable importance computation.

Usage

.parse_variable_importance_settings(
  config = NULL,
  data,
  parallel,
  outcome_type,
  vimp_method = waiver(),
  vimp_method_parameter = waiver(),
  vimp_aggregation_method = waiver(),
  vimp_aggregation_rank_threshold = waiver(),
  parallel_vimp = waiver(),
  ...
)

Arguments

config

A list of settings, e.g. from an xml file.

data

Data set as loaded using the .load_data function.

parallel

Logical value that whether familiar uses parallelisation. If FALSE it will override parallel_vimp.

outcome_type

Type of outcome found in the data set.

vimp_method

(required) Variable importance method. familiar implements various variable importance methods. Please refer to the vignette on variable importance methods for more details.

More than one variable importance method can be chosen. The experiment will then repeated for each feature selection method.

Variable importance methods determine the ranking of features. Actual selection of features is done by optimising the signature size model hyperparameter during the hyperparameter optimisation step.

vimp_method_parameter

(optional) List of lists containing parameters for feature selection methods. Each sublist should have the name of the feature selection method it corresponds to.

Most feature selection methods do not have parameters that can be set. Please refer to the vignette on feature selection methods for more details. Note that if the feature selection method is based on a learner (e.g. lasso regression), hyperparameter optimisation may be performed prior to assessing variable importance.

vimp_aggregation_method

(optional) The method used to aggregate variable importances over different data subsets, e.g. bootstraps. The following methods can be selected:

  • none: Don't aggregate ranks, but rather aggregate the variable importance scores themselves.

  • mean: Use the mean rank of a feature over the subsets to determine the aggregated feature rank.

  • median: Use the median rank of a feature over the subsets to determine the aggregated feature rank.

  • best: Use the best rank the feature obtained in any subset to determine the aggregated feature rank.

  • worst: Use the worst rank the feature obtained in any subset to determine the aggregated feature rank.

  • stability: Use the frequency of the feature being in the subset of highly ranked features as measure for the aggregated feature rank (Meinshausen and Buehlmann, 2010).

  • exponential: Use a rank-weighted frequence of occurrence in the subset of highly ranked features as measure for the aggregated feature rank (Haury et al., 2011).

  • borda (default): Use the borda count as measure for the aggregated feature rank (Wald et al., 2012).

  • enhanced_borda: Use an occurrence frequency-weighted borda count as measure for the aggregated feature rank (Wald et al., 2012).

  • truncated_borda: Use borda count computed only on features within the subset of highly ranked features.

  • enhanced_truncated_borda: Apply both the enhanced borda method and the truncated borda method and use the resulting borda count as the aggregated feature rank.

The feature selection methods vignette provides additional information.

vimp_aggregation_rank_threshold

(optional) The threshold used to define the subset of highly important features. If set to NULL, this threshold is determined by maximising the variance in the occurrence value over all features over the subset size. The default value is 5.

This parameter is only relevant for stability, exponential, enhanced_borda, truncated_borda and enhanced_truncated_borda methods.

parallel_vimp

(optional) Enable parallel processing for variable importance tasks. Defaults to TRUE. When set to FALSE, this will disable the use of parallel processing while performing feature selection, regardless of the settings of the parallel parameter. parallel_vimp is ignored if parallel = FALSE.

...

Unused arguments.

Value

List of parameters related to variable importance computation.

References

  1. Wald, R., Khoshgoftaar, T. M., Dittman, D., Awada, W. & Napolitano, A. An extensive comparison of feature ranking aggregation techniques in bioinformatics. in 2012 IEEE 13th International Conference on Information Reuse Integration (IRI) 377–384 (2012).

  2. Meinshausen, N. & Buehlmann, P. Stability selection. J. R. Stat. Soc. Series B Stat. Methodol. 72, 417–473 (2010).

  3. Haury, A.-C., Gestraud, P. & Vert, J.-P. The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PLoS One 6, e28210 (2011).


familiar documentation built on May 23, 2026, 1:07 a.m.