plot_variable_importance-methods: Plot variable importance scores of features during feature...
In familiar: End-to-End Automated Machine Learning and Model Evaluation

plot_variable_importance

R Documentation

Plot variable importance scores of features during feature selection or after training a model.

Description

This function plots variable importance based data obtained during feature selection or after training a model, which are stored in a familiarCollection object.

Usage

plot_variable_importance(
  object,
  type,
  feature_cluster_method = waiver(),
  feature_linkage_method = waiver(),
  feature_cluster_cut_method = waiver(),
  feature_similarity_threshold = waiver(),
  aggregation_method = waiver(),
  rank_threshold = waiver(),
  draw = FALSE,
  dir_path = NULL,
  split_by = NULL,
  color_by = NULL,
  facet_by = NULL,
  facet_wrap_cols = NULL,
  show_cluster = TRUE,
  ggtheme = NULL,
  discrete_palette = NULL,
  gradient_palette = waiver(),
  x_label = "feature",
  rotate_x_tick_labels = waiver(),
  y_label = waiver(),
  legend_label = waiver(),
  plot_title = waiver(),
  plot_sub_title = waiver(),
  caption = NULL,
  y_range = NULL,
  y_n_breaks = 5,
  y_breaks = NULL,
  width = waiver(),
  height = waiver(),
  units = waiver(),
  export_collection = FALSE,
  ...
)

## S4 method for signature 'ANY'
plot_variable_importance(
  object,
  type,
  feature_cluster_method = waiver(),
  feature_linkage_method = waiver(),
  feature_cluster_cut_method = waiver(),
  feature_similarity_threshold = waiver(),
  aggregation_method = waiver(),
  rank_threshold = waiver(),
  draw = FALSE,
  dir_path = NULL,
  split_by = NULL,
  color_by = NULL,
  facet_by = NULL,
  facet_wrap_cols = NULL,
  show_cluster = TRUE,
  ggtheme = NULL,
  discrete_palette = NULL,
  gradient_palette = waiver(),
  x_label = "feature",
  rotate_x_tick_labels = waiver(),
  y_label = waiver(),
  legend_label = waiver(),
  plot_title = waiver(),
  plot_sub_title = waiver(),
  caption = NULL,
  y_range = NULL,
  y_n_breaks = 5,
  y_breaks = NULL,
  width = waiver(),
  height = waiver(),
  units = waiver(),
  export_collection = FALSE,
  ...
)

## S4 method for signature 'familiarCollection'
plot_variable_importance(
  object,
  type,
  feature_cluster_method = waiver(),
  feature_linkage_method = waiver(),
  feature_cluster_cut_method = waiver(),
  feature_similarity_threshold = waiver(),
  aggregation_method = waiver(),
  rank_threshold = waiver(),
  draw = FALSE,
  dir_path = NULL,
  split_by = NULL,
  color_by = NULL,
  facet_by = NULL,
  facet_wrap_cols = NULL,
  show_cluster = TRUE,
  ggtheme = NULL,
  discrete_palette = NULL,
  gradient_palette = waiver(),
  x_label = "feature",
  rotate_x_tick_labels = waiver(),
  y_label = waiver(),
  legend_label = waiver(),
  plot_title = waiver(),
  plot_sub_title = waiver(),
  caption = NULL,
  y_range = NULL,
  y_n_breaks = 5,
  y_breaks = NULL,
  width = waiver(),
  height = waiver(),
  units = waiver(),
  export_collection = FALSE,
  ...
)

plot_feature_selection_occurrence(...)

plot_feature_selection_variable_importance(...)

plot_model_signature_occurrence(...)

plot_model_signature_variable_importance(...)

Arguments

`object`	A `familiarCollection` object, or other other objects from which a `familiarCollection` can be extracted. See details for more information.
`type`	Determine what variable importance should be shown. Can be `feature_selection` or `model` for the variable importance after the feature selection step and after the model training step, respectively.
`feature_cluster_method`	The method used to perform clustering. These are the same methods as for the `cluster_method` configuration parameter: `none`, `hclust`, `agnes`, `diana` and `pam`. `none` cannot be used when extracting data regarding mutual correlation or feature expressions. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`feature_linkage_method`	The method used for agglomerative clustering in `hclust` and `agnes`. These are the same methods as for the `cluster_linkage_method` configuration parameter: `average`, `single`, `complete`, `weighted`, and `ward`. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`feature_cluster_cut_method`	The method used to divide features into separate clusters. The available methods are the same as for the `cluster_cut_method` configuration parameter: `silhouette`, `fixed_cut` and `dynamic_cut`. `silhouette` is available for all cluster methods, but `fixed_cut` only applies to methods that create hierarchical trees (`hclust`, `agnes` and `diana`). `dynamic_cut` requires the `dynamicTreeCut` package and can only be used with `agnes` and `hclust`. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`feature_similarity_threshold`	The threshold level for pair-wise similarity that is required to form feature clusters with the `fixed_cut` method. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`aggregation_method`	(optional) The method used to aggregate variable importances over different data subsets, e.g. bootstraps. The following methods can be selected: `mean` (default): Use the mean rank of a feature over the subsets to determine the aggregated feature rank. `median`: Use the median rank of a feature over the subsets to determine the aggregated feature rank. `best`: Use the best rank the feature obtained in any subset to determine the aggregated feature rank. `worst`: Use the worst rank the feature obtained in any subset to determine the aggregated feature rank. `stability`: Use the frequency of the feature being in the subset of highly ranked features as measure for the aggregated feature rank (Meinshausen and Buehlmann, 2010). `exponential`: Use a rank-weighted frequence of occurrence in the subset of highly ranked features as measure for the aggregated feature rank (Haury et al., 2011). `borda`: Use the borda count as measure for the aggregated feature rank (Wald et al., 2012). `enhanced_borda`: Use an occurrence frequency-weighted borda count as measure for the aggregated feature rank (Wald et al., 2012). `truncated_borda`: Use borda count computed only on features within the subset of highly ranked features. `enhanced_truncated_borda`: Apply both the enhanced borda method and the truncated borda method and use the resulting borda count as the aggregated feature rank.
`rank_threshold`	(optional) The threshold used to define the subset of highly important features. If not set, this threshold is determined by maximising the variance in the occurrence value over all features over the subset size. This parameter is only relevant for `stability`, `exponential`, `enhanced_borda`, `truncated_borda` and `enhanced_truncated_borda` methods.
`draw`	(optional) Draws the plot if TRUE.
`dir_path`	(optional) Path to the directory where created figures are saved to. Output is saved in the `variable_importance` subdirectory. If `NULL` no figures are saved, but are returned instead.
`split_by`	(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables.
`color_by`	(optional) Variables used to determine fill colour of plot objects. The variables cannot overlap with those provided to the `split_by` argument, but may overlap with other arguments. See details for available variables.
`facet_by`	(optional) Variables used to determine how and if facets of each figure appear. In case the `facet_wrap_cols` argument is `NULL`, the first variable is used to define columns, and the remaing variables are used to define rows of facets. The variables cannot overlap with those provided to the `split_by` argument, but may overlap with other arguments. See details for available variables.
`facet_wrap_cols`	(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead.
`show_cluster`	(optional) Show which features were clustered together. Currently not available in combination with variable importance obtained during feature selection.
`ggtheme`	(optional) `ggplot` theme to use for plotting.
`discrete_palette`	(optional) Palette to use for coloring bar plots, in case a non-singular variable was provided to the `color_by` argument.
`gradient_palette`	(optional) Palette to use for filling the bars in case the `color_by` argument is not set. The bars are then coloured according to the occurrence of features. By default, no gradient is used, and the bars are not filled according to occurrence. Use `NULL` to fill the bars using the default palette in `familiar`.
`x_label`	(optional) Label to provide to the x-axis. If NULL, no label is shown.
`rotate_x_tick_labels`	(optional) Rotate tick labels on the x-axis by 90 degrees. Defaults to `TRUE`. Rotation of x-axis tick labels may also be controlled through the `ggtheme`. In this case, `FALSE` should be provided explicitly.
`y_label`	(optional) Label to provide to the y-axis. If NULL, no label is shown.
`legend_label`	(optional) Label to provide to the legend. If NULL, the legend will not have a name.
`plot_title`	(optional) Label to provide as figure title. If NULL, no title is shown.
`plot_sub_title`	(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown.
`caption`	(optional) Label to provide as figure caption. If NULL, no caption is shown.
`y_range`	(optional) Value range for the y-axis.
`y_n_breaks`	(optional) Number of breaks to show on the y-axis of the plot. `y_n_breaks` is used to determine the `y_breaks` argument in case it is unset.
`y_breaks`	(optional) Break points on the y-axis of the plot.
`width`	(optional) Width of the plot. A default value is derived from the number of facets and the number of features.
`height`	(optional) Height of the plot. A default value is derived from number of facets, and the length of the longest feature name (if `rotate_x_tick_labels` is `TRUE`).
`units`	(optional) Plot size unit. Either `cm` (default), `mm` or `⁠in⁠`.
`export_collection`	(optional) Exports the collection if TRUE.
`...`	Arguments passed on to `as_familiar_collection`, `ggplot2::ggsave`, `extract_fs_vimp` `familiar_data_names` Names of the dataset(s). Only used if the `object` parameter is one or more `familiarData` objects. `collection_name` Name of the collection. `device` Device to use. Can either be a device function (e.g. png), or one of "eps", "ps", "tex" (pictex), "pdf", "jpeg", "tiff", "png", "bmp", "svg" or "wmf" (windows only). If `NULL` (default), the device is guessed based on the `filename` extension. `scale` Multiplicative scaling factor. `dpi` Plot resolution. Also accepts a string input: "retina" (320), "print" (300), or "screen" (72). Applies only to raster output types. `limitsize` When `TRUE` (the default), `ggsave()` will not save images larger than 50x50 inches, to prevent the common error of specifying dimensions in pixels. `bg` Background colour. If `NULL`, uses the `plot.background` fill value from the plot theme. `create.dir` Whether to create new directories if a non-existing directory is specified in the `filename` or `path` (`TRUE`) or return an error (`FALSE`, default). If `FALSE` and run in an interactive session, a prompt will appear asking to create a new directory when necessary. `verbose` Flag to indicate whether feedback should be provided on the computation and extraction of various data elements. `message_indent` Number of indentation steps for messages shown during computation and extraction of various data elements.

Details

This function generates a barplot based on variable importance of features.

The only allowed values for split_by, color_by or facet_by are fs_method and learner, but note that learner has no effect when plotting variable importance of features acquired during feature selection.

Available palettes for discrete_palette and gradient_palette are those listed by grDevices::palette.pals() (requires R >= 4.0.0), grDevices::hcl.pals() (requires R >= 3.6.0) and rainbow, heat.colors, terrain.colors, topo.colors and cm.colors, which correspond to the palettes of the same name in grDevices. If not specified, a default palette based on palettes in Tableau are used. You may also specify your own palette by using colour names listed by grDevices::colors() or through hexadecimal RGB strings.

Labeling methods such as set_feature_names or set_fs_method_names can be applied to the familiarCollection object to update labels, and order the output in the figure.