plot_confusion_matrix-methods: Plot confusion matrix.
In familiar: End-to-End Automated Machine Learning and Model Evaluation

plot_confusion_matrix

R Documentation

Plot confusion matrix.

Description

This method creates confusion matrices based on data in a familiarCollection object.

Usage

plot_confusion_matrix(
  object,
  draw = FALSE,
  dir_path = NULL,
  split_by = NULL,
  facet_by = NULL,
  facet_wrap_cols = NULL,
  ggtheme = NULL,
  discrete_palette = NULL,
  x_label = waiver(),
  y_label = waiver(),
  legend_label = waiver(),
  plot_title = waiver(),
  plot_sub_title = waiver(),
  caption = NULL,
  rotate_x_tick_labels = waiver(),
  show_alpha = TRUE,
  width = waiver(),
  height = waiver(),
  units = waiver(),
  export_collection = FALSE,
  ...
)

## S4 method for signature 'ANY'
plot_confusion_matrix(
  object,
  draw = FALSE,
  dir_path = NULL,
  split_by = NULL,
  facet_by = NULL,
  facet_wrap_cols = NULL,
  ggtheme = NULL,
  discrete_palette = NULL,
  x_label = waiver(),
  y_label = waiver(),
  legend_label = waiver(),
  plot_title = waiver(),
  plot_sub_title = waiver(),
  caption = NULL,
  rotate_x_tick_labels = waiver(),
  show_alpha = TRUE,
  width = waiver(),
  height = waiver(),
  units = waiver(),
  export_collection = FALSE,
  ...
)

## S4 method for signature 'familiarCollection'
plot_confusion_matrix(
  object,
  draw = FALSE,
  dir_path = NULL,
  split_by = NULL,
  facet_by = NULL,
  facet_wrap_cols = NULL,
  ggtheme = NULL,
  discrete_palette = NULL,
  x_label = waiver(),
  y_label = waiver(),
  legend_label = waiver(),
  plot_title = waiver(),
  plot_sub_title = waiver(),
  caption = NULL,
  rotate_x_tick_labels = waiver(),
  show_alpha = TRUE,
  width = waiver(),
  height = waiver(),
  units = waiver(),
  export_collection = FALSE,
  ...
)

Arguments

`object`	`familiarCollection` object, or one or more `familiarData` objects, that will be internally converted to a `familiarCollection` object. It is also possible to provide a `familiarEnsemble` or one or more `familiarModel` objects together with the data from which data is computed prior to export. Paths to such files can also be provided.
`draw`	(optional) Draws the plot if TRUE.
`dir_path`	(optional) Path to the directory where created confusion matrixes are saved to. Output is saved in the `performance` subdirectory. If `NULL` no figures are saved, but are returned instead.
`split_by`	(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables.
`facet_by`	(optional) Variables used to determine how and if facets of each figure appear. In case the `facet_wrap_cols` argument is `NULL`, the first variable is used to define columns, and the remaing variables are used to define rows of facets. The variables cannot overlap with those provided to the `split_by` argument, but may overlap with other arguments. See details for available variables.
`facet_wrap_cols`	(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead.
`ggtheme`	(optional) `ggplot` theme to use for plotting.
`discrete_palette`	(optional) Palette used to colour the confusion matrix. The colour depends on whether each cell of the confusion matrix is on the diagonal (observed outcome matched expected outcome) or not.
`x_label`	(optional) Label to provide to the x-axis. If NULL, no label is shown.
`y_label`	(optional) Label to provide to the y-axis. If NULL, no label is shown.
`legend_label`	(optional) Label to provide to the legend. If NULL, the legend will not have a name.
`plot_title`	(optional) Label to provide as figure title. If NULL, no title is shown.
`plot_sub_title`	(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown.
`caption`	(optional) Label to provide as figure caption. If NULL, no caption is shown.
`rotate_x_tick_labels`	(optional) Rotate tick labels on the x-axis by 90 degrees. Defaults to `TRUE`. Rotation of x-axis tick labels may also be controlled through the `ggtheme`. In this case, `FALSE` should be provided explicitly.
`show_alpha`	(optional) Interpreting confusion matrices is made easier by setting the opacity of the cells. `show_alpha` takes the following values: `none`: Cell opacity is not altered. Diagonal and off-diagonal cells are completely opaque and transparent, respectively. Same as `show_alpha=FALSE`. `by_class`: Cell opacity is normalised by the number of instances for each observed outcome class in each confusion matrix. `by_matrix` (default): Cell opacity is normalised by the number of instances in the largest observed outcome class in each confusion matrix. Same as `show_alpha=TRUE` `by_figure`: Cell opacity is normalised by the number of instances in the largest observed outcome class across confusion matrices in different facets. `by_all`: Cell opacity is normalised by the number of instances in the largest observed outcome class across all confusion matrices.
`width`	(optional) Width of the plot. A default value is derived from the number of facets.
`height`	(optional) Height of the plot. A default value is derived from the number of features and the number of facets.
`units`	(optional) Plot size unit. Either `cm` (default), `mm` or `⁠in⁠`.
`export_collection`	(optional) Exports the collection if TRUE.
`...`	Arguments passed on to `as_familiar_collection`, `ggplot2::ggsave`, `extract_confusion_matrix` `familiar_data_names` Names of the dataset(s). Only used if the `object` parameter is one or more `familiarData` objects. `collection_name` Name of the collection. `device` Device to use. Can either be a device function (e.g. png), or one of "eps", "ps", "tex" (pictex), "pdf", "jpeg", "tiff", "png", "bmp", "svg" or "wmf" (windows only). If `NULL` (default), the device is guessed based on the `filename` extension. `scale` Multiplicative scaling factor. `dpi` Plot resolution. Also accepts a string input: "retina" (320), "print" (300), or "screen" (72). Applies only to raster output types. `limitsize` When `TRUE` (the default), `ggsave()` will not save images larger than 50x50 inches, to prevent the common error of specifying dimensions in pixels. `bg` Background colour. If `NULL`, uses the `plot.background` fill value from the plot theme. `create.dir` Whether to create new directories if a non-existing directory is specified in the `filename` or `path` (`TRUE`) or return an error (`FALSE`, default). If `FALSE` and run in an interactive session, a prompt will appear asking to create a new directory when necessary. `data` A `dataObject` object, `data.table` or `data.frame` that constitutes the data that are assessed. `is_pre_processed` Flag that indicates whether the data was already pre-processed externally, e.g. normalised and clustered. Only used if the `data` argument is a `data.table` or `data.frame`. `cl` Cluster created using the `parallel` package. This cluster is then used to speed up computation through parallellisation. `ensemble_method` Method for ensembling predictions from models for the same sample. Available methods are: `median` (default): Use the median of the predicted values as the ensemble value for a sample. `mean`: Use the mean of the predicted values as the ensemble value for a sample. `verbose` Flag to indicate whether feedback should be provided on the computation and extraction of various data elements. `message_indent` Number of indentation steps for messages shown during computation and extraction of various data elements. `detail_level` (optional) Sets the level at which results are computed and aggregated. `ensemble`: Results are computed at the ensemble level, i.e. over all models in the ensemble. This means that, for example, bias-corrected estimates of model performance are assessed by creating (at least) 20 bootstraps and computing the model performance of the ensemble model for each bootstrap. `hybrid` (default): Results are computed at the level of models in an ensemble. This means that, for example, bias-corrected estimates of model performance are directly computed using the models in the ensemble. If there are at least 20 trained models in the ensemble, performance is computed for each model, in contrast to `ensemble` where performance is computed for the ensemble of models. If there are less than 20 trained models in the ensemble, bootstraps are created so that at least 20 point estimates can be made. `model`: Results are computed at the model level. This means that, for example, bias-corrected estimates of model performance are assessed by creating (at least) 20 bootstraps and computing the performance of the model for each bootstrap. Note that each level of detail has a different interpretation for bootstrap confidence intervals. For `ensemble` and `model` these are the confidence intervals for the ensemble and an individual model, respectively. That is, the confidence interval describes the range where an estimate produced by a respective ensemble or model trained on a repeat of the experiment may be found with the probability of the confidence level. For `hybrid`, it represents the range where any single model trained on a repeat of the experiment may be found with the probability of the confidence level. By definition, confidence intervals obtained using `hybrid` are at least as wide as those for `ensemble`. `hybrid` offers the correct interpretation if the goal of the analysis is to assess the result of a single, unspecified, model. `hybrid` is generally computationally less expensive then `ensemble`, which in turn is somewhat less expensive than `model`. A non-default `detail_level` parameter can be specified for separate evaluation steps by providing a parameter value in a named list with data elements, e.g. `list("auc_data"="ensemble", "model_performance"="hybrid")`. This parameter can be set for the following data elements: `auc_data`, `decision_curve_analyis`, `model_performance`, `permutation_vimp`, `ice_data`, `prediction_data` and `confusion_matrix`.

Details

This function generates area under the ROC curve plots.

Available splitting variables are: fs_method, learner and data_set. By default, the data is split by fs_method and learner, with facetting by data_set.

Available palettes for discrete_palette are those listed by grDevices::palette.pals() (requires R >= 4.0.0), grDevices::hcl.pals() (requires R >= 3.6.0) and rainbow, heat.colors, terrain.colors, topo.colors and cm.colors, which correspond to the palettes of the same name in grDevices. If not specified, a default palette based on palettes in Tableau are used. You may also specify your own palette by using colour names listed by grDevices::colors() or through hexadecimal RGB strings.

Labeling methods such as set_fs_method_names or set_data_set_names can be applied to the familiarCollection object to update labels, and order the output in the figure.