plot_sample_clustering-methods: Plot heatmaps for pairwise similarity between features.

plot_sample_clusteringR Documentation

Plot heatmaps for pairwise similarity between features.

Description

This method creates a heatmap based on data stored in a familiarCollection object. Features in the heatmap are ordered so that more similar features appear together.

Usage

plot_sample_clustering(
  object,
  feature_cluster_method = waiver(),
  feature_linkage_method = waiver(),
  sample_cluster_method = waiver(),
  sample_linkage_method = waiver(),
  sample_limit = waiver(),
  draw = FALSE,
  dir_path = NULL,
  split_by = NULL,
  x_axis_by = NULL,
  y_axis_by = NULL,
  facet_by = NULL,
  facet_wrap_cols = NULL,
  ggtheme = NULL,
  gradient_palette = NULL,
  gradient_palette_range = waiver(),
  outcome_palette = NULL,
  outcome_palette_range = waiver(),
  x_label = waiver(),
  x_label_shared = "column",
  y_label = waiver(),
  y_label_shared = "row",
  legend_label = waiver(),
  outcome_legend_label = waiver(),
  plot_title = waiver(),
  plot_sub_title = waiver(),
  caption = NULL,
  x_range = NULL,
  x_n_breaks = 3,
  x_breaks = NULL,
  y_range = NULL,
  y_n_breaks = 3,
  y_breaks = NULL,
  rotate_x_tick_labels = waiver(),
  show_feature_dendrogram = TRUE,
  show_sample_dendrogram = TRUE,
  show_normalised_data = TRUE,
  show_outcome = TRUE,
  dendrogram_height = grid::unit(1.5, "cm"),
  outcome_height = grid::unit(0.3, "cm"),
  evaluation_times = waiver(),
  width = waiver(),
  height = waiver(),
  units = waiver(),
  export_collection = FALSE,
  verbose = TRUE,
  ...
)

## S4 method for signature 'ANY'
plot_sample_clustering(
  object,
  feature_cluster_method = waiver(),
  feature_linkage_method = waiver(),
  sample_cluster_method = waiver(),
  sample_linkage_method = waiver(),
  sample_limit = waiver(),
  draw = FALSE,
  dir_path = NULL,
  split_by = NULL,
  x_axis_by = NULL,
  y_axis_by = NULL,
  facet_by = NULL,
  facet_wrap_cols = NULL,
  ggtheme = NULL,
  gradient_palette = NULL,
  gradient_palette_range = waiver(),
  outcome_palette = NULL,
  outcome_palette_range = waiver(),
  x_label = waiver(),
  x_label_shared = "column",
  y_label = waiver(),
  y_label_shared = "row",
  legend_label = waiver(),
  outcome_legend_label = waiver(),
  plot_title = waiver(),
  plot_sub_title = waiver(),
  caption = NULL,
  x_range = NULL,
  x_n_breaks = 3,
  x_breaks = NULL,
  y_range = NULL,
  y_n_breaks = 3,
  y_breaks = NULL,
  rotate_x_tick_labels = waiver(),
  show_feature_dendrogram = TRUE,
  show_sample_dendrogram = TRUE,
  show_normalised_data = TRUE,
  show_outcome = TRUE,
  dendrogram_height = grid::unit(1.5, "cm"),
  outcome_height = grid::unit(0.3, "cm"),
  evaluation_times = waiver(),
  width = waiver(),
  height = waiver(),
  units = waiver(),
  export_collection = FALSE,
  verbose = TRUE,
  ...
)

## S4 method for signature 'familiarCollection'
plot_sample_clustering(
  object,
  feature_cluster_method = waiver(),
  feature_linkage_method = waiver(),
  sample_cluster_method = waiver(),
  sample_linkage_method = waiver(),
  sample_limit = waiver(),
  draw = FALSE,
  dir_path = NULL,
  split_by = NULL,
  x_axis_by = NULL,
  y_axis_by = NULL,
  facet_by = NULL,
  facet_wrap_cols = NULL,
  ggtheme = NULL,
  gradient_palette = NULL,
  gradient_palette_range = waiver(),
  outcome_palette = NULL,
  outcome_palette_range = waiver(),
  x_label = waiver(),
  x_label_shared = "column",
  y_label = waiver(),
  y_label_shared = "row",
  legend_label = waiver(),
  outcome_legend_label = waiver(),
  plot_title = waiver(),
  plot_sub_title = waiver(),
  caption = NULL,
  x_range = NULL,
  x_n_breaks = 3,
  x_breaks = NULL,
  y_range = NULL,
  y_n_breaks = 3,
  y_breaks = NULL,
  rotate_x_tick_labels = waiver(),
  show_feature_dendrogram = TRUE,
  show_sample_dendrogram = TRUE,
  show_normalised_data = TRUE,
  show_outcome = TRUE,
  dendrogram_height = grid::unit(1.5, "cm"),
  outcome_height = grid::unit(0.3, "cm"),
  evaluation_times = waiver(),
  width = waiver(),
  height = waiver(),
  units = waiver(),
  export_collection = FALSE,
  verbose = TRUE,
  ...
)

Arguments

object

A familiarCollection object, or other other objects from which a familiarCollection can be extracted. See details for more information.

feature_cluster_method

The method used to perform clustering. These are the same methods as for the cluster_method configuration parameter: none, hclust, agnes, diana and pam.

none cannot be used when extracting data regarding mutual correlation or feature expressions.

If not provided explicitly, this parameter is read from settings used at creation of the underlying familiarModel objects.

feature_linkage_method

The method used for agglomerative clustering in hclust and agnes. These are the same methods as for the cluster_linkage_method configuration parameter: average, single, complete, weighted, and ward.

If not provided explicitly, this parameter is read from settings used at creation of the underlying familiarModel objects.

sample_cluster_method

The method used to perform clustering based on distance between samples. These are the same methods as for the cluster_method configuration parameter: hclust, agnes, diana and pam.

none cannot be used when extracting data for feature expressions.

If not provided explicitly, this parameter is read from settings used at creation of the underlying familiarModel objects.

sample_linkage_method

The method used for agglomerative clustering in hclust and agnes. These are the same methods as for the cluster_linkage_method configuration parameter: average, single, complete, weighted, and ward.

If not provided explicitly, this parameter is read from settings used at creation of the underlying familiarModel objects.

sample_limit

(optional) Set the upper limit of the number of samples that are used during evaluation steps. Cannot be less than 20.

This setting can be specified per data element by providing a parameter value in a named list with data elements, e.g. list("sample_similarity"=100, "permutation_vimp"=1000).

This parameter can be set for the following data elements: sample_similarity and ice_data.

draw

(optional) Draws the plot if TRUE.

dir_path

(optional) Path to the directory where created performance plots are saved to. Output is saved in the feature_similarity subdirectory. If NULL no figures are saved, but are returned instead.

split_by

(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables.

x_axis_by

(optional) Variable plotted along the x-axis of a plot. The variable cannot overlap with variables provided to the split_by and y_axis_by arguments (if used), but may overlap with other arguments. Only one variable is allowed for this argument. See details for available variables.

y_axis_by

(optional) Variable plotted along the y-axis of a plot. The variable cannot overlap with variables provided to the split_by and x_axis_by arguments (if used), but may overlap with other arguments. Only one variable is allowed for this argument. See details for available variables.

facet_by

(optional) Variables used to determine how and if facets of each figure appear. In case the facet_wrap_cols argument is NULL, the first variable is used to define columns, and the remaing variables are used to define rows of facets. The variables cannot overlap with those provided to the split_by argument, but may overlap with other arguments. See details for available variables.

facet_wrap_cols

(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead.

ggtheme

(optional) ggplot theme to use for plotting.

gradient_palette

(optional) Sequential or divergent palette used to colour the similarity or distance between features in a heatmap.

gradient_palette_range

(optional) Numerical range used to span the gradient. This should be a range of two values, e.g. c(0, 1). Lower or upper boundary can be unset by using NA. If not set, the full metric-specific range is used.

outcome_palette

(optional) Sequential (continuous, count outcomes) or qualitative (other outcome types) palette used to show outcome values. This argument is ignored if the outcome is not shown.

outcome_palette_range

(optional) Numerical range used to span the gradient of numeric (continuous, count) outcome values. This argument is ignored for other outcome types or if the outcome is not shown.

x_label

(optional) Label to provide to the x-axis. If NULL, no label is shown.

x_label_shared

(optional) Sharing of x-axis labels between facets. One of three values:

  • overall: A single label is placed at the bottom of the figure. Tick text (but not the ticks themselves) is removed for all but the bottom facet plot(s).

  • column: A label is placed at the bottom of each column. Tick text (but not the ticks themselves) is removed for all but the bottom facet plot(s).

  • individual: A label is placed below each facet plot. Tick text is kept.

y_label

(optional) Label to provide to the y-axis. If NULL, no label is shown.

y_label_shared

(optional) Sharing of y-axis labels between facets. One of three values:

  • overall: A single label is placed to the left of the figure. Tick text (but not the ticks themselves) is removed for all but the left-most facet plot(s).

  • row: A label is placed to the left of each row. Tick text (but not the ticks themselves) is removed for all but the left-most facet plot(s).

  • individual: A label is placed below each facet plot. Tick text is kept.

legend_label

(optional) Label to provide to the legend. If NULL, the legend will not have a name.

outcome_legend_label

(optional) Label to provide to the legend for outcome data. If NULL, the legend will not have a name. By default, class, value and event are used for binomial and multinomial, continuous and count, and survival outcome types, respectively.

plot_title

(optional) Label to provide as figure title. If NULL, no title is shown.

plot_sub_title

(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown.

caption

(optional) Label to provide as figure caption. If NULL, no caption is shown.

x_range

(optional) Value range for the x-axis.

x_n_breaks

(optional) Number of breaks to show on the x-axis of the plot. x_n_breaks is used to determine the x_breaks argument in case it is unset.

x_breaks

(optional) Break points on the x-axis of the plot.

y_range

(optional) Value range for the y-axis.

y_n_breaks

(optional) Number of breaks to show on the y-axis of the plot. y_n_breaks is used to determine the y_breaks argument in case it is unset.

y_breaks

(optional) Break points on the y-axis of the plot.

rotate_x_tick_labels

(optional) Rotate tick labels on the x-axis by 90 degrees. Defaults to TRUE. Rotation of x-axis tick labels may also be controlled through the ggtheme. In this case, FALSE should be provided explicitly.

show_feature_dendrogram

(optional) Show feature dendrogram around the main panel. Can be TRUE, FALSE, NULL, or a position, i.e. top, bottom, left and right.

If a position is specified, it should be appropriate with regard to the x_axis_by or y_axis_by argument. If x_axis_by is sample (default), the only valid positions are top (default) and bottom. Alternatively, if y_axis_by is feature, the only valid positions are right (default) and left.

A dendrogram can only be drawn from cluster methods that produce dendograms, such as hclust. A dendogram can for example not be constructed using the partioning around medioids method (pam).

show_sample_dendrogram

(optional) Show sample dendrogram around the main panel. Can be TRUE, FALSE, NULL, or a position, i.e. top, bottom, left and right.

If a position is specified, it should be appropriate with regard to the x_axis_by or y_axis_by argument. If y_axis_by is sample (default), the only valid positions are right (default) and left. Alternatively, if x_axis_by is sample, the only valid positions are top (default) and bottom.

A dendrogram can only be drawn from cluster methods that produce dendograms, such as hclust. A dendogram can for example not be constructed using the partioning around medioids method (pam).

show_normalised_data

(optional) Flag that determines whether the data shown in the main heatmap is normalised using the same settings as within the analysis (fixed; default), using a standardisation method (set_normalisation) that is applied separately to each dataset, or not at all (none), which shows the data at the original scale, albeit with batch-corrections.

Categorial variables are plotted to span 90% of the entire numerical value range, i.e. the levels of categorical variables with 2 levels are represented at 5% and 95% of the range, with 3 levels at 5%, 50%, and 95%, etc.

show_outcome

(optional) Show outcome column(s) or row(s) in the graph. Can be TRUE, FALSE, NULL or a poistion, i.e. top, bottom, left and right.

If a position is specified, it should be appropriate with regard to the x_axis_by or y_axis_by argument. If y_axis_by is sample (default), the only valid positions are left (default) and right. Alternatively, if x_axis_by is sample, the only valid positions are top (default) and bottom.

The outcome data will be drawn between the main panel and the sample dendrogram (if any).

dendrogram_height

(optional) Height of the dendrogram. The height is 1.5 cm by default. Height is expected to be grid unit (see grid::unit), which also allows for specifying relative heights.

outcome_height

(optional) Height of an outcome data column/row. The height is 0.3 cm by default. Height is expected to be a grid unit (see grid::unit), which also allows for specifying relative heights. In case of survival outcome data with multipe evaluation_times, this height is multiplied by the number of time points.

evaluation_times

(optional) Times at which the event status of time-to-event survival outcomes are determined. Only used for survival outcome. If not specified, the values used when creating the underlying familiarData objects are used.

width

(optional) Width of the plot. A default value is derived from the number of facets.

height

(optional) Height of the plot. A default value is derived from the number of features and the number of facets.

units

(optional) Plot size unit. Either cm (default), mm or ⁠in⁠.

export_collection

(optional) Exports the collection if TRUE.

verbose

Flag to indicate whether feedback should be provided for the plotting.

...

Arguments passed on to as_familiar_collection, ggplot2::ggsave, extract_feature_expression

familiar_data_names

Names of the dataset(s). Only used if the object parameter is one or more familiarData objects.

collection_name

Name of the collection.

device

Device to use. Can either be a device function (e.g. png), or one of "eps", "ps", "tex" (pictex), "pdf", "jpeg", "tiff", "png", "bmp", "svg" or "wmf" (windows only). If NULL (default), the device is guessed based on the filename extension.

scale

Multiplicative scaling factor.

dpi

Plot resolution. Also accepts a string input: "retina" (320), "print" (300), or "screen" (72). Applies only to raster output types.

limitsize

When TRUE (the default), ggsave() will not save images larger than 50x50 inches, to prevent the common error of specifying dimensions in pixels.

bg

Background colour. If NULL, uses the plot.background fill value from the plot theme.

create.dir

Whether to create new directories if a non-existing directory is specified in the filename or path (TRUE) or return an error (FALSE, default). If FALSE and run in an interactive session, a prompt will appear asking to create a new directory when necessary.

feature_similarity

Table containing pairwise distance between sample. This is used to determine cluster information, and indicate which samples are similar. The table is created by the extract_sample_similarity method.

data

A dataObject object, data.table or data.frame that constitutes the data that are assessed.

feature_similarity_metric

Metric to determine pairwise similarity between features. Similarity is computed in the same manner as for clustering, and feature_similarity_metric therefore has the same options as cluster_similarity_metric: mcfadden_r2, cox_snell_r2, nagelkerke_r2, spearman, kendall and pearson.

If not provided explicitly, this parameter is read from settings used at creation of the underlying familiarModel objects.

sample_similarity_metric

Metric to determine pairwise similarity between samples. Similarity is computed in the same manner as for clustering, but sample_similarity_metric has different options that are better suited to computing distance between samples instead of between features: gower, euclidean.

The underlying feature data is scaled to the [0, 1] range (for numerical features) using the feature values across the samples. The normalisation parameters required can optionally be computed from feature data with the outer 5% (on both sides) of feature values trimmed or winsorised. To do so append ⁠_trim⁠ (trimming) or ⁠_winsor⁠ (winsorising) to the metric name. This reduces the effect of outliers somewhat.

If not provided explicitly, this parameter is read from settings used at creation of the underlying familiarModel objects.

message_indent

Number of indentation steps for messages shown during computation and extraction of various data elements.

Details

This function generates area under the ROC curve plots.

Available splitting variables are: fs_method, learner, and data_set. By default, the data is split by fs_method and learner and data_set, since the number of samples will typically differ between data sets, even for the same feature selection method and learner.

The x_axis_by and y_axis_by arguments determine what data are shown along which axis. Each argument takes one of feature and sample, and both arguments should be unique. By default, features are shown along the x-axis and samples along the y-axis.

Note that similarity is determined based on the underlying data. Hence the ordering of features may differ between facets, and tick labels are maintained for each panel.

Available palettes for gradient_palette are those listed by grDevices::palette.pals() (requires R >= 4.0.0), grDevices::hcl.pals() (requires R >= 3.6.0) and rainbow, heat.colors, terrain.colors, topo.colors and cm.colors, which correspond to the palettes of the same name in grDevices. If not specified, a default palette based on palettes in Tableau are used. You may also specify your own palette by using colour names listed by grDevices::colors() or through hexadecimal RGB strings.

Labeling methods such as set_fs_method_names or set_data_set_names can be applied to the familiarCollection object to update labels, and order the output in the figure.

Value

NULL or list of plot objects, if dir_path is NULL.


familiar documentation built on Sept. 30, 2024, 9:18 a.m.