plot_variable_importance-methods: Plot variable importance scores of features during feature...

plot_variable_importanceR Documentation

Plot variable importance scores of features during feature selection or after training a model.

Description

This function plots variable importance based data obtained during feature selection or after training a model, which are stored in a familiarCollection object.

Usage

plot_variable_importance(
  object,
  type,
  feature_cluster_method = waiver(),
  feature_linkage_method = waiver(),
  feature_cluster_cut_method = waiver(),
  feature_similarity_threshold = waiver(),
  aggregation_method = waiver(),
  rank_threshold = waiver(),
  draw = FALSE,
  dir_path = NULL,
  split_by = NULL,
  color_by = NULL,
  facet_by = NULL,
  facet_wrap_cols = NULL,
  show_cluster = TRUE,
  ggtheme = NULL,
  discrete_palette = NULL,
  gradient_palette = waiver(),
  x_label = "feature",
  rotate_x_tick_labels = waiver(),
  y_label = waiver(),
  legend_label = waiver(),
  plot_title = waiver(),
  plot_sub_title = waiver(),
  caption = NULL,
  y_range = NULL,
  y_n_breaks = 5,
  y_breaks = NULL,
  width = waiver(),
  height = waiver(),
  units = waiver(),
  export_collection = FALSE,
  ...
)

## S4 method for signature 'ANY'
plot_variable_importance(
  object,
  type,
  feature_cluster_method = waiver(),
  feature_linkage_method = waiver(),
  feature_cluster_cut_method = waiver(),
  feature_similarity_threshold = waiver(),
  aggregation_method = waiver(),
  rank_threshold = waiver(),
  draw = FALSE,
  dir_path = NULL,
  split_by = NULL,
  color_by = NULL,
  facet_by = NULL,
  facet_wrap_cols = NULL,
  show_cluster = TRUE,
  ggtheme = NULL,
  discrete_palette = NULL,
  gradient_palette = waiver(),
  x_label = "feature",
  rotate_x_tick_labels = waiver(),
  y_label = waiver(),
  legend_label = waiver(),
  plot_title = waiver(),
  plot_sub_title = waiver(),
  caption = NULL,
  y_range = NULL,
  y_n_breaks = 5,
  y_breaks = NULL,
  width = waiver(),
  height = waiver(),
  units = waiver(),
  export_collection = FALSE,
  ...
)

## S4 method for signature 'familiarCollection'
plot_variable_importance(
  object,
  type,
  feature_cluster_method = waiver(),
  feature_linkage_method = waiver(),
  feature_cluster_cut_method = waiver(),
  feature_similarity_threshold = waiver(),
  aggregation_method = waiver(),
  rank_threshold = waiver(),
  draw = FALSE,
  dir_path = NULL,
  split_by = NULL,
  color_by = NULL,
  facet_by = NULL,
  facet_wrap_cols = NULL,
  show_cluster = TRUE,
  ggtheme = NULL,
  discrete_palette = NULL,
  gradient_palette = waiver(),
  x_label = "feature",
  rotate_x_tick_labels = waiver(),
  y_label = waiver(),
  legend_label = waiver(),
  plot_title = waiver(),
  plot_sub_title = waiver(),
  caption = NULL,
  y_range = NULL,
  y_n_breaks = 5,
  y_breaks = NULL,
  width = waiver(),
  height = waiver(),
  units = waiver(),
  export_collection = FALSE,
  ...
)

plot_feature_selection_occurrence(...)

plot_feature_selection_variable_importance(...)

plot_model_signature_occurrence(...)

plot_model_signature_variable_importance(...)

Arguments

object

A familiarCollection object, or other other objects from which a familiarCollection can be extracted. See details for more information.

type

Determine what variable importance should be shown. Can be feature_selection or model for the variable importance after the feature selection step and after the model training step, respectively.

feature_cluster_method

The method used to perform clustering. These are the same methods as for the cluster_method configuration parameter: none, hclust, agnes, diana and pam.

none cannot be used when extracting data regarding mutual correlation or feature expressions.

If not provided explicitly, this parameter is read from settings used at creation of the underlying familiarModel objects.

feature_linkage_method

The method used for agglomerative clustering in hclust and agnes. These are the same methods as for the cluster_linkage_method configuration parameter: average, single, complete, weighted, and ward.

If not provided explicitly, this parameter is read from settings used at creation of the underlying familiarModel objects.

feature_cluster_cut_method

The method used to divide features into separate clusters. The available methods are the same as for the cluster_cut_method configuration parameter: silhouette, fixed_cut and dynamic_cut.

silhouette is available for all cluster methods, but fixed_cut only applies to methods that create hierarchical trees (hclust, agnes and diana). dynamic_cut requires the dynamicTreeCut package and can only be used with agnes and hclust.

If not provided explicitly, this parameter is read from settings used at creation of the underlying familiarModel objects.

feature_similarity_threshold

The threshold level for pair-wise similarity that is required to form feature clusters with the fixed_cut method.

If not provided explicitly, this parameter is read from settings used at creation of the underlying familiarModel objects.

aggregation_method

(optional) The method used to aggregate variable importances over different data subsets, e.g. bootstraps. The following methods can be selected:

  • mean (default): Use the mean rank of a feature over the subsets to determine the aggregated feature rank.

  • median: Use the median rank of a feature over the subsets to determine the aggregated feature rank.

  • best: Use the best rank the feature obtained in any subset to determine the aggregated feature rank.

  • worst: Use the worst rank the feature obtained in any subset to determine the aggregated feature rank.

  • stability: Use the frequency of the feature being in the subset of highly ranked features as measure for the aggregated feature rank (Meinshausen and Buehlmann, 2010).

  • exponential: Use a rank-weighted frequence of occurrence in the subset of highly ranked features as measure for the aggregated feature rank (Haury et al., 2011).

  • borda: Use the borda count as measure for the aggregated feature rank (Wald et al., 2012).

  • enhanced_borda: Use an occurrence frequency-weighted borda count as measure for the aggregated feature rank (Wald et al., 2012).

  • truncated_borda: Use borda count computed only on features within the subset of highly ranked features.

  • enhanced_truncated_borda: Apply both the enhanced borda method and the truncated borda method and use the resulting borda count as the aggregated feature rank.

rank_threshold

(optional) The threshold used to define the subset of highly important features. If not set, this threshold is determined by maximising the variance in the occurrence value over all features over the subset size.

This parameter is only relevant for stability, exponential, enhanced_borda, truncated_borda and enhanced_truncated_borda methods.

draw

(optional) Draws the plot if TRUE.

dir_path

(optional) Path to the directory where created figures are saved to. Output is saved in the variable_importance subdirectory. If NULL no figures are saved, but are returned instead.

split_by

(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables.

color_by

(optional) Variables used to determine fill colour of plot objects. The variables cannot overlap with those provided to the split_by argument, but may overlap with other arguments. See details for available variables.

facet_by

(optional) Variables used to determine how and if facets of each figure appear. In case the facet_wrap_cols argument is NULL, the first variable is used to define columns, and the remaing variables are used to define rows of facets. The variables cannot overlap with those provided to the split_by argument, but may overlap with other arguments. See details for available variables.

facet_wrap_cols

(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead.

show_cluster

(optional) Show which features were clustered together. Currently not available in combination with variable importance obtained during feature selection.

ggtheme

(optional) ggplot theme to use for plotting.

discrete_palette

(optional) Palette to use for coloring bar plots, in case a non-singular variable was provided to the color_by argument.

gradient_palette

(optional) Palette to use for filling the bars in case the color_by argument is not set. The bars are then coloured according to the occurrence of features. By default, no gradient is used, and the bars are not filled according to occurrence. Use NULL to fill the bars using the default palette in familiar.

x_label

(optional) Label to provide to the x-axis. If NULL, no label is shown.

rotate_x_tick_labels

(optional) Rotate tick labels on the x-axis by 90 degrees. Defaults to TRUE. Rotation of x-axis tick labels may also be controlled through the ggtheme. In this case, FALSE should be provided explicitly.

y_label

(optional) Label to provide to the y-axis. If NULL, no label is shown.

legend_label

(optional) Label to provide to the legend. If NULL, the legend will not have a name.

plot_title

(optional) Label to provide as figure title. If NULL, no title is shown.

plot_sub_title

(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown.

caption

(optional) Label to provide as figure caption. If NULL, no caption is shown.

y_range

(optional) Value range for the y-axis.

y_n_breaks

(optional) Number of breaks to show on the y-axis of the plot. y_n_breaks is used to determine the y_breaks argument in case it is unset.

y_breaks

(optional) Break points on the y-axis of the plot.

width

(optional) Width of the plot. A default value is derived from the number of facets and the number of features.

height

(optional) Height of the plot. A default value is derived from number of facets, and the length of the longest feature name (if rotate_x_tick_labels is TRUE).

units

(optional) Plot size unit. Either cm (default), mm or ⁠in⁠.

export_collection

(optional) Exports the collection if TRUE.

...

Arguments passed on to as_familiar_collection, ggplot2::ggsave, extract_fs_vimp

familiar_data_names

Names of the dataset(s). Only used if the object parameter is one or more familiarData objects.

collection_name

Name of the collection.

device

Device to use. Can either be a device function (e.g. png), or one of "eps", "ps", "tex" (pictex), "pdf", "jpeg", "tiff", "png", "bmp", "svg" or "wmf" (windows only). If NULL (default), the device is guessed based on the filename extension.

scale

Multiplicative scaling factor.

dpi

Plot resolution. Also accepts a string input: "retina" (320), "print" (300), or "screen" (72). Applies only to raster output types.

limitsize

When TRUE (the default), ggsave() will not save images larger than 50x50 inches, to prevent the common error of specifying dimensions in pixels.

bg

Background colour. If NULL, uses the plot.background fill value from the plot theme.

create.dir

Whether to create new directories if a non-existing directory is specified in the filename or path (TRUE) or return an error (FALSE, default). If FALSE and run in an interactive session, a prompt will appear asking to create a new directory when necessary.

verbose

Flag to indicate whether feedback should be provided on the computation and extraction of various data elements.

message_indent

Number of indentation steps for messages shown during computation and extraction of various data elements.

Details

This function generates a barplot based on variable importance of features.

The only allowed values for split_by, color_by or facet_by are fs_method and learner, but note that learner has no effect when plotting variable importance of features acquired during feature selection.

Available palettes for discrete_palette and gradient_palette are those listed by grDevices::palette.pals() (requires R >= 4.0.0), grDevices::hcl.pals() (requires R >= 3.6.0) and rainbow, heat.colors, terrain.colors, topo.colors and cm.colors, which correspond to the palettes of the same name in grDevices. If not specified, a default palette based on palettes in Tableau are used. You may also specify your own palette by using colour names listed by grDevices::colors() or through hexadecimal RGB strings.

Labeling methods such as set_feature_names or set_fs_method_names can be applied to the familiarCollection object to update labels, and order the output in the figure.

Value

NULL or list of plot objects, if dir_path is NULL.


familiar documentation built on Sept. 30, 2024, 9:18 a.m.