nmatlist2heatmaps: Make multiple coverage heatmaps
In jmw86069/platjam: Platform Jam, biological platform importers.

nmatlist2heatmaps

R Documentation

Make multiple coverage heatmaps

Description

Make multiple coverage heatmaps

Usage

nmatlist2heatmaps(
  nmatlist,
  panel_groups = NULL,
  title = NULL,
  title_gp = grid::gpar(fontsize = 16),
  caption = NULL,
  upstream_length = NULL,
  downstream_length = NULL,
  k_clusters = 0,
  min_rows_per_k = 100,
  k_subset = NULL,
  k_colors = NULL,
  k_width = grid::unit(5, "mm"),
  k_method = c("correlation", "euclidean", "pearson", "spearman"),
  k_heatmap = main_heatmap,
  partition = NULL,
  row_title_rot = 0,
  partition_counts = TRUE,
  partition_count_template = "{partition_name}\n({counts} rows)",
  rows = NULL,
  row_order = NULL,
  nmat_colors = NULL,
  middle_color = "white",
  nmat_names = NULL,
  main_heatmap = NULL,
  anno_df = NULL,
  byCols = NULL,
  color_sub = NULL,
  anno_row_marks = NULL,
  anno_row_labels = NULL,
  anno_row_gp = grid::gpar(fontsize = 14),
  recenter_heatmap = NULL,
  summit_names = NULL,
  recenter_range = NULL,
  recenter_invert = FALSE,
  restrand_heatmap = NULL,
  restrand_range = NULL,
  restrand_buffer = NULL,
  restrand_invert = FALSE,
  top_annotation = NULL,
  top_anno_height = grid::unit(3, "cm"),
  top_axis_side = c("right"),
  legend_max_ncol = 2,
  legend_base_nrow = 12,
  legend_max_labels = 40,
  show_heatmap_legend = TRUE,
  heatmap_legend_param = NULL,
  heatmap_legend_direction = "horizontal",
  annotation_legend_param = NULL,
  hm_nrow = 1,
  transform = "none",
  transform_label = NULL,
  signal_ceiling = NULL,
  axis_name = NULL,
  axis_name_gp = grid::gpar(fontsize = 10),
  axis_name_rot = 90,
  column_title_gp = grid::gpar(fontsize = 14),
  lens = -2,
  anno_lens = 8,
  pos_line = FALSE,
  seed = 123,
  ht_gap = grid::unit(4, "mm"),
  row_anno_padding = grid::unit(4, "mm"),
  column_anno_padding = grid::unit(4, "mm"),
  legend_padding = grid::unit(1, "cm"),
  profile_value = c("mean", "sum", "abs_mean", "abs_sum"),
  profile_linetype = c(1, 5, 3),
  profile_linewidth = 1.5,
  ylims = NULL,
  border = TRUE,
  iter.max = 20,
  use_raster = TRUE,
  raster_quality = 1,
  raster_by_magick = jamba::check_pkg_installed("magick"),
  do_plot = TRUE,
  do_caption = TRUE,
  legend_fontsize = 10,
  legend_width = grid::unit(3, "cm"),
  trim_legend_title = TRUE,
  padding = grid::unit(c(0.1, 0.1, 0.1, 0.1), "cm"),
  return_type = c("heatmaplist", "grid"),
  show_error = FALSE,
  verbose = FALSE,
  ...
)

Arguments

`nmatlist`	`list` containing `normalizedMatrix` objects, usually the output from `coverage_matrix2nmat()`.
`panel_groups`	`character` vector with values for each `nmatlist` entry, which defines groups of heatmap panels. Each panel group shares: numeric range for the heatmap color gradient, defined by the first `signal_ceiling` value for the group. Standard rules apply, such that values below 1 represent a quantile signal threshold, and values above 1 represent a fixed numeric threshold. one color key, labeled by `names(panel_groups)` to represent all panels in the group `ylim` y-axis range for the profile plot, either determined dynamically or by the first `ylim` provided for the panel group When `nmat_colors` is not defined, each panel group is assigned one categorical color which is applied to all heatmaps in the group. When `nmat_colors` is defined, each panel uses the color as defined, however the color key only uses the color gradient from the first panel in the group.
`title`, `caption`	`character` string used as an overall title or caption, respectively, displayed at the top of all heatmap output.
`title_gp`	`grid::gpar` object to customize the title fontsize, fontface, color (col), etc.
`upstream_length`, `downstream_length`	`numeric` (optional) range of coordinates to display across all heatmaps. This argument is intended when the input `nmatlist` contains a wider range of coordinates than should be displayed. The columns in `nmatlist` are subset to retain only those columns within the range `downstream_length` to `upstream_length`, assuming the middle coordinate is zero. This step calls `zoom_nmatlist()`. Note this step does not expand the displayed region.
`k_clusters`	`integer` number of k-means clusters to use to partition each heatmap. Use `0` or `NULL` for no clustering (default). Note `k_clusters` can be a `numeric` vector, in which case it is applied across unique groups defined by `partition` if provided. If `names(k_clusters)` match values in `partition` they will be applied by name, otherwise they are applied in the order the clusters are defined by `partition`. Each group is clustered to that many k clusters, provided it also meets the threshold `min_row_per_k` - which is intended to prevent clustering 10 rows into 10 k-means clusters.
`min_rows_per_k`	`numeric` minimum rows required per k-means cluster, used only when `k_clusters` is greater than 1. With default `min_rows_per_k=10`, a partition with 100 or fewer rows can only have `k=1`, and partition with 101 rows can have `k=2`. This limit protects from k-means clustering small partitions.
`k_subset`	`integer` vector of k-means clusters to retain. This argument is intended to "zoom in" (or "drill down") to one or more k-means clusters of interest. When both `k_clusters` and `partition` are provided, this argument must exactly match the row title as displayed in the heatmap.
`k_colors`	`character` vector of R colors, or `NULL` to use the output of `colorjam::rainbowJam(k_clusters)`. These colors are applied to `k_clusters` and/or `partition`: When `partition` is provided, `names(k_colors)` are used when present, otherwise colors are assigned in order of `partition` groups. When `k_clusters` is also defined, each partition color is split into a light-to-dark gradient based upon the number of k_clusters. When `partition` is not provided, `k_colors` are applied to k-means clusters in the order the colors are provided.
`k_width`	`unit` width of the k-means cluster color bar, used with `k_clusters`, default is 5 mm width.
`k_method`	`character` string indicating the distance used by k-means, where the common default is `"euclidean"`, however a useful alternative for sequence coverage data is `"correlation"` as implemented in `amap::Kmeans()`. Available methods: `"euclidean"` (default) calculates the typical Euclidean distance, which tends to emphasize total signal moreso than the specific shape of the signal. `"correlation"` when the R package `amap` is available, this method emphasizes the shape of signal profiles, and is particularly effective. It is also called "centered Pearson" since data is centered prior to calculating correlation. `"pearson"` when the R package `amap` is available, this method is also called "not centered Pearson" since data is not centered prior to calculating correlation. `"spearman"` when the R package `amap` is available, this method computes distance based upon rank differences. It has not been tested much in this context.
`k_heatmap`	`integer` with one or more values indicating which `nmatlist` entries to use for k-means clustering, default uses `main_heatmap`. This value is only used when `k_clusters` is greater than 1. This argument is useful for clustering multiple coverage heatmaps together.
`partition`	`character` or `factor` vector used to split rows of each matrix in `nmatlist`, and must named by rownames in `nmatlist`. This value is converted to `factor`, and will honor provided factor levels if already defined. When `partition` and `k_clusters` are both defined, the data is first grouped by `partition` then each partition group is separately k-means clustered, using rules described for `k_clusters` and `min_rows_per_k`. Colors from `k_colors` are assigned to each partition value, then colors are split to light-to-dark gradient using `jamba::color2gradient()`.
`row_title_rot`	`numeric` value in degrees, to rotate the partition labels on the left, when either `partition` or `k_clusters` are provided. The default `0` uses horizontal text. For long labels, it may be better to use `30` or `60`.
`partition_counts`	`logical` indicating whether to include the number of rows in each partition, default `TRUE`. Note that this setting is active if `k_clusters` and/or `partition` are supplied. Any situation where rows are split, the number of rows will be displayed.
`partition_count_template`	`character` format used when `partition_counts=TRUE`, used together with `glue::glue()` to format each row partition. The default: `"{partition_name}\n({counts} rows)"` will print for example: `"A\n(125 rows)"`
`rows`	optional vector to define subset rows, or specific row order: `character` vector of rownames in `nmatlist`, or `integer` vector with row numbers (row index) values. Note that even when using a subset of `rows` the data may also be subset based upon available `names(partition)` and `rownames(anno_df)`.
`row_order`	`integer` vector used to order rows, intended to allow ordering data based upon a specific heatmap, or using different logic than the default. When `row_order=NULL` (default) or `row_order=TRUE` it calls `EnrichedHeatmap::enriched_score()` using data from `main_heatmap`. When there are multiple values for `main_heatmap` (which is default), then scores are calculated for each matrix, then the average score is used per row. The `enriched_score()` function generates a weighted score with heighest weight at the center position, with progressively lower weight working outward where the maximum distance has zero weight. The technique sorts signal which emphasizes highest enriched signal at the center of the matrix. When `row_order=FALSE` the data is ordered in the same order they appear in `nmatlist`, or when `anno_df` and `byCols` are supplied, the rows in `anno_df` are sorted using `jamba::mixedSort(anno_df, byCols=byCols)` and the resulting row order is used.
`nmat_colors`	`character` vector of R colors, to colorize each heatmap. When `nmat_colors=NULL` (default) and `panel_groups` is not defined, `colorjam::rainbowJam()` is used to assign one unique color to each heatmap panel. When `nmat_colors=NULL` and `panel_groups` is defined, `colorjam::rainbowJam()` is used to assign one unique color to each unique panel group, and the same color is applied to each heatmap panel in each panel group.
`middle_color`	`character` R color, default `middle_color="white"`, used as the middle color when creating a divergent color gradient. This color should usually be either `"white"` or `"black"`, but sometimes can be slightly off-white or off-black to apply some distinction from the background color.
`nmat_names`	`character` vector, or `NULL`, optional, used as custom names for each heatmap in `nmatlist`. When `nmat_names=NULL` the `signal_name` values are used from each `nmatlist` entry attribute: `attr(nmat, "signal_name")`
`main_heatmap`	`integer` index to define one or more entries in `nmatlist` as the main heatmap used for clustering and row ordering. Note that `k_heatmap` will override this option when provided. By default `main_heatmap=NULL` will cause all heatmaps to be used for row ordering.
`anno_df`	`data.frame` or object that can be coerced to `data.frame` whose `rownames(anno_df)` must match rownames in the nmatlist data. When `rownames(anno_df)` does not match, this function fails with an error message. Data can optionally be sorted by defining `byCols`. When provided, data in `nmatlist` is automatically subsetted to the matching `rownames(anno_df)` also present in `nmatlist`. When `rows` is also defined, the data will be subsetted by the `rows` and by the `rownames(anno_df)` present in `nmatlist`.
`byCols`	`character` vector of `colnames(anno_df)` used to sort the `data.frame`. This argument is passed to `jamba::mixedSortDF()` and follows its rules, for example prefix `"-"` causes the column to be sorted in reverse. Multiple columns can be sorted, in the order they are provided, and factor levels are honored for factor columns.
`color_sub`	accepts input in two forms: `character` vector of R colors named by `character` values `list` output from `design2colors()` where each `list` element is named by colnames present in `anno_df`, and each `list` value is either: `character` vector of colors named by `character` value, or color `function` as defined by `circlize::colorRamp2()`, which takes a `numeric` value and returns a `character` R color. When values for any column in `anno_df` does not have colors assigned by one mechanism above, colors are assigned using `colorjam::group2colors()`. When `partition` is defined, colors are assigned either by matching unique partition values with `names(color_sub)`, or with `attr(color_sub, "color_sub")` if present, which may contain the full set of name-color assignments when `color_sub` is provided as a `list`. Otherwise if `color_sub` is provided as a `list` each entry is compared with `partition` values until values can be fully matched. Failing these steps, colors are assigned to unique `partition` values, then if `k_clusters` is also supplied, the partition colors are then split by `colorjam::color2gradient()` across the k-means clusters for each partition.
`anno_row_marks`	`character` optional vector of `rownames` in `nmatlist` that should be labeled beside the heatmaps using `ComplexHeatmap::anno_mark()`. Note `anno_row_labels` can be used to supply custom labels, or one or more columns in `anno_df`. When `anno_row_labels=NULL` (default) it displays the value in `anno_row_marks` itself.
`anno_row_labels`	`character` vector of optional labels to use when `anno_row_marks` is supplied. When `anno_row_labels=NULL` (default) it uses rownames defined in 'anno_row_marks. It can be a `character` vector of actual labels, with names that match `anno_row_marks` (thus rownames in `nmatlist`). It can be a `character` vector with one or more `colnames(anno_df)`, which creates labels by concatenating values across columns, delimited with space `" "`.
`anno_row_gp`	`grid::gpar` object used to customize the text label displayed when `anno_row_marks` is defined. The default fontsize 14 is intended to be larger than other default values, for legibility.
`recenter_heatmap`, `recenter_range`, `recenter_invert`	arguments are passed to `recenter_nmatlist()` to apply re-centering. Note that recenter will always occur before restrand.
`summit_names`	`character` default NULL, optional colnames to use for recentering, which applies a previously defined set of summit positions to use. It ignores all other recenter arguments.
`restrand_heatmap`, `restrand_range`, `restrand_buffer`, `restrand_invert`	arguments are passed to `restrand_nmatlist()` to apply re-stranding. Note that recenter will always occur before restrand.
`top_annotation`	`HeatmapAnnotation` or `logical` or `list`: `top_annotation=TRUE` (default) uses the default `EnrichedHeatmap::anno_enriched()` to display the signal profile for each row partition and/or k-means cluster. `top_annotation=FALSE` does not display a top annotation. object `HeatmapAnnotation` as produced by `ComplexHeatmap::HeatmapAnnotation(EnrichedHeatmap::anno_enriched())` or equivalent. This form is required for the annotation function to be called successfully on each heatmap in `nmatlist`. a `list` of objects to be applied sequentially to each `nmatlist` coverage heatmap in order, intended to allow custom top annotation for each heatmap.
`top_anno_height`	`unit` object to define the default height of the `top_annotation`. When `top_annotation` is not defined, the default method uses `EnrichedHeatmap::anno_enriched()` with `height=top_anno_height`.
`top_axis_side`	`character` value indicating which side of the top annotation to place the y-axis labels. When only one value is defined, it is recycled across `nmatlist`. Otherwise it is used when `panel_groups` are defined, and the top annotation is labeled for only one panel in each panel group using the side as defined. Labels are displayed for each contiguous set of panel groups, so that heatmaps in the same panel group can be ordered in different subsets. Consider panel groups in this order: A, A, B, B, A, A. It would display one set of axis labels for the first two panels in A, then one axis label for the next two panels in B, then one axis label again for the final two panels in A. Values should be one of: `"left"`,`"right"`: axis labels on this side of each panel group `"both"`: axis labels on both sides of each panel group, useful when panel groups have a fairly large number of panels. `"none"`: display no axis labels `"all"`: display axis labels for every panel even within panel group.
`legend_max_ncol`	`integer` number indicating the maximum number of columns allowed for a categorical color legend.
`legend_base_nrow`	`integer` number indicating the base number of rows used for a categorical color legend, before additional columns are added. Once the number of elements exceeds `(legend_max_ncol * legend_base_nrow)` then rows are added, but columns never exceed `legend_max_ncol`.
`legend_max_labels`	`integer` to define the maximum labels to display as a color legend. When any `anno_df` column contains more than this number of categorical colors, the legend is not displayed, in order to prevent the color legend from filling the entire plot device, thus hiding the heatmaps.
`show_heatmap_legend`	`logical` indicating whether to display the color legend for each heatmap entry in `nmatlist`. When `panel_groups` are supplied, color legends are displayed only for the first heatmap in each unique panel group, unless `show_heatmap_legend=FALSE`, or unless `show_heatmap_legend` is already defined for every heatmap.
`heatmap_legend_param`	`list` with optional heatmap legend settings. By default `NULL` causes this argument to be defined internally, however when provided it overrides any internal settings and is used directly. The `list` should be `length(nmatlist)`, or is recycled to that length.
`heatmap_legend_direction`	`character` string used when `show_heatmap_legend=TRUE` and `heatmap_legend_param` is not already provided. By default `heatmap_legend_direction="horizontal"` displays the color gradient in the legend horizontally as a continuous scale, with labels defined in `EnrichedHeatmap::EnrichedHeatmap()`, and width equal to `grid::unit(1, "npc")` which uses the full width of the color legend area. When `heatmap_legend_direction="vertical"` the color legend is displayed vertically, with width `grid::unit(5, "mm")`.
`annotation_legend_param`	`list` optional parameters passed to the annotation legend functions, intended to provide customization. The `list` should be named by each annotation entry to be customized, and any annotation entries not defined `annotation_legend_param` use the default behavior of `ComplexHeatmap::HeatmapAnnotation()`, which will assign its own set of colors and use default legend parameters by default. When `annotation_legend_param=NULL` (default) then all colors are defined, and all legends are displayed using this function defaults. When there are more labels than `legend_max_labels` the color legend will be hidden for that annotation legend entry.
`hm_nrow`	`integer` number of rows used to display the heatmap panels. This mechanism is somewhat experimental, and is used to split a large number of coverage heatmaps into two rows of heatmaps. The matrix data row order is consistent across all heatmap panels. The annotation data is displayed to the left of each row of heatmap panels.
`transform`	one of the following: `character` string referring to a numeric transformation, passed to `get_numeric_transform()`. Commonly used strings: `"log2signed"` calls `jamba::log2signed()`, which applies `log2(1+x)` to the absolute value, multiplied by `sign(x)` `"sqrt"` applies square root to the absolute value, multiplied by the `sign(x)` `"cubert"` applies cube root `x^(1/3)` `"qrt"` applies fourth root `x^(1/4)` to the absolute value, multiplied by the `sign(x)` `function` that applies a numeric transformation. Valid `character` string values: `"log2signed"` applies `jamba::log2signed()` which applies `log2(1+x)` transform to the absolute value, then multiplies by the original `sign(x)`; `"sqrt"` applies square root; `"cubert"` applies cube root `x^(1/3)`; `"qrt"` applies fourth root `x^(1/4)`. When there are negative numeric values, the transformation is applied to absolute value, then multiplied by the original sign. Therefore, the transformation is applied to adjust the magnitude of the values. These values are passed to `get_numeric_transform()` which may have more information.
`transform_label`	`character` optional vector of transformation labels to use. When `transform_label=NULL` (default) it uses `names(transform)` if present, then the `character` string of `transform`, otherwise is left blank. When `transform="none"` no label is displayed. By default, transform labels are surrounded by parentheses, for example `"(log2signed)"` and placed on a new line below each coverage heatmap title. To suppress the transformation in the title, supply `transform_label=""`.
`signal_ceiling`	`numeric` vector whose values are recycled to length `length(nmatlist)`. The `signal_ceiling` defines the maximum numeric value to the color ramp for each matrix in `nmatlist`. The value is passed to `get_nmat_ceiling()`, which recognizes three numeric forms: `signal_ceiling=NULL`: (default) the maximum absolute value is used as the ceiling. `signal_ceiling > 1`: the specific numeric value is applied as a fixed ceiling, even if the value is above or below the maximum absolute value in the data matrix. This setting is useful for defining a fixed meaningful threshold across `nmatlist` entries. `signal_ceiling > 0` and `signal_ceiling <= 1`: the numeric value defines a quantile threshold calculated using signal in the data matrix, excluding values of zero. For example `signal_ceiling=0.75` calculates ceiling `quantile(x, probs=0.75)`, using non-zero values. Note that the ceiling is only applied to the color scale and not to the underlying data. The row clustering and row ordering steps use the full data range, after applying the appropriate `transform` where applicable. To apply a numeric ceiling to the data itself, it should be done at the level of `nmatlist` beforehand.
`axis_name`	`character` string with optional custom label used for the target region label in each heatmap panel. When `axis_name=NULL` (default), the `attr(nmat, "target_name")` label will be used, which is usually "target", along with the upstream and downstream length as stored in `attr(nmat, "extend")`. a `character` vector will be applied as the center (target) label on each heatmap, using the upstream and downstream length as stored in `attr(nmat, "extend")`. a `list` is expected to have three labels per vector element, corresponding to the upstream, target, and downstream axis label. This `list` is recycled to `length(nmatlist)`.
`axis_name_gp`	object of `grid::gpar` applied to the x-axis label graphic parameters. For example, to customize the x-axis font size, use the form: `grid::gpar(fontsize=8)`.
`axis_name_rot`	`numeric` value either `0` or `90` indicating whether to rotate the x-axis names below each heatmap, where `axis_name_row=90` (default) will rotate labels vertically, and `axis_name_row=0` will display labels horizontally. Note that `axis_name_rot` also controls the rotation of annotation (`anno_df`) and partition (`partition` or `k_clusters`) annotation labels, below each annotation heatmap.
`column_title_gp`	object `grid::gpar` or `list` of `grid::gpar` objects, applied across entries in `nmatlist` to customize the title displayed above each heatmap panel. For example to alter the font size, use `grid::gpar(fontsize=14)`. This argument is passed to `ComplexHeatmap::Heatmap()`, and can be customized for each heatmap as needed.
`lens`	`numeric` adjustment to the intensity of the color gradient, used only when the corresponding `nmat_colors` entry uses a fixed set of colors. `lens` above zero create more rapid color changes, making the gradient more visually intense, values below zero reduce the intensity. The `lens` values are recycled to `length(nmatlist)` as needed. Note that `signal_ceiling` defines the `numeric` value at which the maximum color is applied, while `lens` adjusts the intensity of the intermediate values in the color gradient.
`anno_lens`	`numeric` value used to scale the annotation heatmap color scales, see `lens` for details. This value is applied to `numeric` columns only when `anno_df` is provided.
`seed`	`numeric` value used with `set.seed()` to set the random seed. Set to `NULL` to avoid running `set.seed()`.
`ht_gap`	`unit` size to specify the gap between multiple heatmaps. This argument is passed to `ComplexHeatmap::draw()`. An example is `grid::unit(8, "mm")` to specify 8 millimeters.
`row_anno_padding`, `column_anno_padding`, `legend_padding`	`grid::unit` to define the padding between heatmap body, and row annotation, column annotation, and heatmap color legend, respectively. The default values are intended to provide more space between heatmap and these features than between heatmap subsections (`row_gap`). The defaults are 4mm for row and column annotations, and 1cm for the color legend. The `legend_padding` is useful to minimize overlap with legend and the y-axis labels from the metaplots at the top of each heatmap.
`profile_value`	`character` string to define the type of numeric profile to display at the top of each heatmap. This argument is passed to `EnrichedHeatmap::anno_enriched()`. Values: `"mean"` the mean profile; `"sum"` the sum; `"abs_sum"` sum of absolute values; `"abs_mean"` the mean of absolute values.
`profile_linetype`	`numeric` or `character` default c(1, 5, 3) passed to `grid::gpar(lty)` to define the metaplot line type. Default lty=1 is a solid line. Values are recycled to the number of profile plots.
`profile_linewidth`	`numeric` line width passed to `grid::gpar(lwd)` to control the line width. Default uses lwd=1.5. Values are recycled to the number of profile plots.
`ylims`	`numeric` vector of maximum y-axis values for each heatmap profile; or `list` of min,max values to apply to each `nmatlist` entry.
`border`	`logical` indicating whether to draw a border around the heatmap, which includes all heatmap panels in the event of splitting by clustering. The `border` can be supplied as a vector, so the `border` can be applied specifically to each heatmap if needed.
`iter.max`	`integer` value indicating the maximum iterations performed by k-means clustering, only relevant when `k_clusters` is non-zero.
`use_raster`	`logical` indicating whether to create heatmaps using raster resizing, almost always recommended `TRUE` otherwise the output will be very sub-optimal.
`raster_quality`	`logical` passed to `ComplexHeatmap::Heatmap()`, used when `use_raster=TRUE` and defines the level of detail retained, and is used only when `raster_by_magick=FALSE`. Using larger numbers decreases speed substantially.
`raster_by_magick`	`logical` passed to `ComplexHeatmap::Heatmap()`, to enable ImageMagick use during rasterization. By default this option is `TRUE` and is only disabled when the R package `"magick"` is not installed, or not properly configured. If you see a warning "instalilng 'magick' will improve rasterization" then check the R package with `library(magick)` and see if there are error messages. When `"magick"` is not available, the rasterization is substantially slower, and may produce files much larger than normal.
`do_plot`	`logical` indicating whether to draw the heatmaps, `do_plot=TRUE` (default) renders the plots as normal. `do_plot=FALSE` will return the data used to create heatmaps without drawing the heatmaps.
`do_caption`	`logical` indicating whether to include a small caption at the bottom-right of the plot, describing the number of rows and columns, the partition, k-means clustering, and main heatmap.
`legend_fontsize`	`numeric` fontsize to use for all legend text, default 10. Optionally two values can be defined, the first is used for legend title, the second is used for legend labels.
`padding`	`grid::unit` object used during `ComplexHeatmap::draw()` to add whitespace padding around the boundaries of the overall list of heatmaps. This padding is useful to enforce extra whitespace, or to prevent labels from exceeding the width of the figure.
`return_type`	`character` string indicating the type of data to return: `"heatmaplist"` returns the list of heatmaps, which can separately be arranged together using `ComplexHeatmap::draw()` or `grid::grid.draw()`. `"grid"` returns the `grid` graphical object which may be easier to render using something like the `patchwork` or `cowplot` R packages.
`show_error`	`logical` indicating whether to add error bars to the profile plot at the top of each heatmap. These error bars are calculated by `EnrichedHeatmap::anno_enriched()` using `matrixStats::colSds(x)/nrow(x)`.
`verbose`	`logical` indicating whether to print verbose output.
`...`	additional arguments are passed to `EnrichedHeatmap::EnrichedHeatmap()` to allow greater customization of details. Note that many `...` arguments are also passed to `ComplexHeatmap::Heatmap()`.

Details

This function takes a list of normalizedMatrix objects, usually the output of coverage_matrix2nmat(), and produces multiple heatmaps using EnrichedHeatmap.

This function is intended to be a convenient wrapper to help keep each data matrix in order, to apply consistent clustering and filtering across all data matrices, and to enable optional multi-row heatmap layout.

Value

list with heatmap components that can be reviewed, or optionally rendered into a figure:

"AHM": annotation heatmap, when anno_df is supplied
"PHM": partition heatmap, when partitioning and/or k-means clustering is used
"EH_l": list of ComplexHeatmap::Heatmap objects
"MHM": marked heatmap, containing optional row labels
"HM_drawn": when hm_nrow=1 this is the output after drawing the heatmap, in the form: ComplexHeatmap::HeatmapList. This object can be drawn again if needed, or used to determine exact row orders.
"fn_params": list of useful function parameters, including some calculated during processing such as panel_groups, ylims, signal_ceiling, etc.
"hm_caption": character version of heatmap captions
"adjust_df": data.frame when recenter_heatmap or restrand_heatmap are defined, which contains a summary of each row, with colnames: "summit_name" for recentering; and "restrand" for restranding.

Annotation Data

When anno_df is provided as a data.frame the rows are synchronized alongside the heatmap rows. Column values are color-coded, categorical for character columns, and using color gradient for numeric columns.

Rows can optionally be split by argument partition, which can be a vector of group values associated with rows, or one or more columns in colnames(anno_df) whose values are used to sub-divide the rows.

Row Clustering / Partitioning

Rows can be clustered using k-means clustering with argument k_clusters. By default it uses k_method="correlation", which applies a novel and effective correlation metric, clustering row data by the profile shape. The typical default, which is used when the amap R package is not installed, is to use "euclidean" distance, which tends to cluster based upon signal magnitude moreso than the shape.

When k-means clustering k_clusters and partition are both enabled, each partition is independently k-means clustered, which improves results compared to applying global k-means before applying partitions. Use min_rows_per_k to adjust the relative number of k clusters based upon the number of observed rows.

Display Layout

Heatmaps are arranged in the following order, dependent upon the data provided:

Annotation heatmap, if anno_df is provided.
- Color assignment can be provided using color_sub either as a named vector of R colors whose names match values in each column, or as a list named by colnames(anno_df), with named color assignments, or a color function for numeric columns.
Partition heatmap, if partition is provided.
Enrichment heatmaps, one for each entry in nmatlist.
- Above each heatmap is the metaplot, drawn using EnrichedHeatmap::anno_enriched().
- When partition and/or k_clusters are defined, the plot will include one profile line for each row grouping.
- When show_error=TRUE each line will also be shaded using 95% standard deviation.
- The heatmap color gradient is applied starting at zero, extending to signal_ceiling for each heatmap. When signal_ceiling is <=1 it uses the quantile of non-zero values in the matrix data, otherwise it applies a fixed numeric maximum. Numeric values above the signal_ceiling threshold are colored using the maximum color.
- When there are negative values, the color key uses a divergent color scale. When nmat_colors value for the heatmap is a single color, the complementary color is used for negative values; otherwise it is assumed to define a divergent color scale.
- The y-axis range on metaplots is defined by observed values, and when panel_groups is defined, the y-axis ylim is shared among all heatmaps in each panel group.
Marked row heatmap, if anno_row_marks is provided. It uses an empty heatmap, associated with row mark annotations for a subset of row labels, in the same order as the coverage heatmaps.
Color legends are displayed in the same order:
- annotation colors for each column in anno_df
- partition/cluster colors
- color gradients for each coverage heatmap in order, or when panel_groups is provided it displays the color key for only the first heatmap in each panel group.

Examples

## There is a small example file to use for testing
# library(jamba)
cov_file1 <- system.file("data", "tss_coverage.matrix", package="platjam");
cov_file2 <- system.file("data", "h3k4me1_coverage.matrix", package="platjam");
cov_files <- c(cov_file1, cov_file2);
names(cov_files) <- gsub("[.]matrix",
   "",
   basename(cov_files));
nmatlist <- coverage_matrix2nmat(cov_files, verbose=FALSE);
sapply(nmatlist, function(nmat){attr(nmat, "signal_name")})
nmatlist2heatmaps(nmatlist);

# sometimes data transform can be helpful
nmatlist2heatmaps(nmatlist,
   transform=c("log2signed", "sqrt"));

# k-means clusters, default uses euclidean distance
nmatlist2heatmaps(nmatlist, k_clusters=4,
   transform=c("log2signed", "sqrt"));

# k-means clusters, "correlation" or "pearson" sometimes works better
nmatlist2heatmaps(nmatlist,
   k_clusters=4,
   min_rows_per_k=20,
   k_method="pearson",
   transform=c("log2signed", "sqrt"));

# example showing usage of top_axis_side
# and panel_groups
nmatlist2 <- nmatlist[c(1, 1, 1, 2, 2, 2)];
names(nmatlist2) <- jamba::makeNames(names(nmatlist2))
for (iname in names(nmatlist2)) {
   attr(nmatlist2[[iname]], "signal_name") <- gsub("coverage", "cov", iname);
}
# top_axis_side="left"
# assumes 12x7 figure size
nmatlist2heatmaps(nmatlist2,
   signal_ceiling=0.8,
   nmat_colors=rep(c("firebrick", "tomato"), each=3),
   panel_groups=rep(c("tss", "h3k4me1"), each=3),
   ht_gap=grid::unit(4, "mm"),
   top_axis_side="left",
   transform=rep(c("log2signed", "sqrt"), each=3));

# top_axis_side="both"
nmatlist2heatmaps(nmatlist2,
   panel_groups=rep(c("tss", "h3k4me1"), each=3),
   ht_gap=grid::unit(6, "mm"),
   top_axis_side="both",
   transform=rep(c("log2signed", "sqrt"), each=3));

# multiple heatmap rows
nmatlist2heatmaps(nmatlist2,
   k_clusters=4,
   k_method="pearson",
   hm_nrow=2,
   panel_groups=rep(c("tss", "h3k4me1"), each=3),
   ht_gap=grid::unit(6, "mm"),
   top_axis_side="both",
   top_anno_height=grid::unit(0.8, "cm"),
   transform=rep(c("log2signed", "sqrt"), each=3));

# invent anno_df data.frame of additional annotations
anno_df <- data.frame(
   tss_score=EnrichedHeatmap::enriched_score(jamba::log2signed(nmatlist[[1]])),
   h3k4me1_score=EnrichedHeatmap::enriched_score(jamba::log2signed(nmatlist[[2]])),
   chromosome=paste0("chr", sample(1:4, replace=TRUE, size=nrow(nmatlist[[1]])))
);
rownames(anno_df) <- rownames(nmatlist[[1]]);
nmatlist2heatmaps(nmatlist,
   title="k-means clustering across both heatmaps",
   k_clusters=4,
   k_method="pearson",
   k_heatmap=c(1, 2),
   ht_gap=grid::unit(6, "mm"),
   top_axis_side="left",
   anno_df=anno_df,
   transform=rep(c("log2signed", "sqrt"), each=3));

# example showing k-means clustering together with annotation groups
anno_df <- data.frame(
   group=sample(c(1, -1, -1),
      size=nrow(nmatlist[[1]]),
      replace=TRUE),
   row.names=rownames(nmatlist[[1]]))
# note for this example the color legends are oriented vertically
# showing how the width is adjusted
nmatlist2heatmaps(nmatlist,
   heatmap_legend_direction="vertical",
   k_clusters=0,
   color_sub=c(`A`="firebrick", `B`="darkorchid"),
   k_colors=c("firebrick", "dodgerblue"),
   min_rows_per_k=50,
   ht_gap=grid::unit(1, "cm"),
   k_method="correlation",
   k_heatmap=1:2,
   anno_df=anno_df,
   partition="group",
   row_title_rot=0,
   transform=rep(c("log2signed", "sqrt"), each=3));

# same as above, partition and k_clusters together
# except uses multiple values for k_clusters
nmatlist2heatmaps(nmatlist,
   k_clusters=c(1, 4),
   min_rows_per_k=25,
   k_heatmap=1:2,
   k_method="correlation",
   anno_df=anno_df,
   partition="group",
   row_title_rot=0)

jmw86069/platjam documentation built on April 12, 2025, 1:41 p.m.