nmatlist2heatmaps: Make multiple coverage heatmaps

nmatlist2heatmapsR Documentation

Make multiple coverage heatmaps

Description

Make multiple coverage heatmaps

Usage

nmatlist2heatmaps(
  nmatlist,
  panel_groups = NULL,
  title = NULL,
  title_gp = grid::gpar(fontsize = 16),
  caption = NULL,
  upstream_length = NULL,
  downstream_length = NULL,
  k_clusters = 0,
  min_rows_per_k = 100,
  k_subset = NULL,
  k_colors = NULL,
  k_width = grid::unit(5, "mm"),
  k_method = c("correlation", "euclidean", "pearson", "spearman"),
  k_heatmap = main_heatmap,
  partition = NULL,
  row_title_rot = 0,
  partition_counts = TRUE,
  partition_count_template = "{partition_name}\n({counts} rows)",
  rows = NULL,
  row_order = NULL,
  nmat_colors = NULL,
  middle_color = "white",
  nmat_names = NULL,
  main_heatmap = NULL,
  anno_df = NULL,
  byCols = NULL,
  color_sub = NULL,
  anno_row_marks = NULL,
  anno_row_labels = NULL,
  anno_row_gp = grid::gpar(fontsize = 14),
  recenter_heatmap = NULL,
  recenter_range = NULL,
  recenter_invert = FALSE,
  restrand_heatmap = NULL,
  restrand_range = NULL,
  restrand_buffer = NULL,
  restrand_invert = FALSE,
  top_annotation = NULL,
  top_anno_height = grid::unit(3, "cm"),
  top_axis_side = c("right"),
  legend_max_ncol = 2,
  legend_base_nrow = 12,
  legend_max_labels = 40,
  show_heatmap_legend = TRUE,
  heatmap_legend_param = NULL,
  heatmap_legend_direction = "horizontal",
  annotation_legend_param = NULL,
  hm_nrow = 1,
  transform = "none",
  transform_label = NULL,
  signal_ceiling = NULL,
  axis_name = NULL,
  axis_name_gp = grid::gpar(fontsize = 10),
  axis_name_rot = 90,
  column_title_gp = grid::gpar(fontsize = 14),
  lens = -2,
  anno_lens = 8,
  pos_line = FALSE,
  seed = 123,
  ht_gap = grid::unit(4, "mm"),
  row_anno_padding = grid::unit(4, "mm"),
  column_anno_padding = grid::unit(4, "mm"),
  legend_padding = grid::unit(1, "cm"),
  profile_value = c("mean", "sum", "abs_mean", "abs_sum"),
  profile_linetype = c(1, 5, 3),
  profile_linewidth = 1.5,
  ylims = NULL,
  border = TRUE,
  iter.max = 20,
  use_raster = TRUE,
  raster_quality = 1,
  raster_by_magick = jamba::check_pkg_installed("magick"),
  do_plot = TRUE,
  do_caption = TRUE,
  legend_fontsize = 10,
  legend_width = grid::unit(3, "cm"),
  trim_legend_title = TRUE,
  padding = grid::unit(c(0.1, 0.1, 0.1, 0.1), "cm"),
  return_type = c("heatmaplist", "grid"),
  show_error = FALSE,
  verbose = FALSE,
  ...
)

Arguments

nmatlist

list containing normalizedMatrix objects, usually the output from coverage_matrix2nmat().

panel_groups

character vector with values for each nmatlist entry, which defines groups of heatmap panels. Each panel group shares:

  • numeric range for the heatmap color gradient, defined by the first signal_ceiling value for the group. Standard rules apply, such that values below 1 represent a quantile signal threshold, and values above 1 represent a fixed numeric threshold.

  • one color key, labeled by names(panel_groups) to represent all panels in the group

  • ylim y-axis range for the profile plot, either determined dynamically or by the first ylim provided for the panel group

  • When nmat_colors is not defined, each panel group is assigned one categorical color which is applied to all heatmaps in the group.

  • When nmat_colors is defined, each panel uses the color as defined, however the color key only uses the color gradient from the first panel in the group.

title, caption

character string used as an overall title or caption, respectively, displayed at the top of all heatmap output.

title_gp

grid::gpar object to customize the title fontsize, fontface, color (col), etc.

upstream_length, downstream_length

numeric (optional) range of coordinates to display across all heatmaps. This argument is intended when the input nmatlist contains a wider range of coordinates than should be displayed. The columns in nmatlist are subset to retain only those columns within the range downstream_length to upstream_length, assuming the middle coordinate is zero. This step calls zoom_nmatlist(). Note this step does not expand the displayed region.

k_clusters

integer number of k-means clusters to use to partition each heatmap. Use 0 or NULL for no clustering (default). Note k_clusters can be a numeric vector, in which case it is applied across unique groups defined by partition if provided. If names(k_clusters) match values in partition they will be applied by name, otherwise they are applied in the order the clusters are defined by partition. Each group is clustered to that many k clusters, provided it also meets the threshold min_row_per_k - which is intended to prevent clustering 10 rows into 10 k-means clusters.

min_rows_per_k

numeric minimum rows required per k-means cluster, used only when k_clusters is greater than 1. With default min_rows_per_k=10, a partition with 100 or fewer rows can only have k=1, and partition with 101 rows can have k=2. This limit protects from k-means clustering small partitions.

k_subset

integer vector of k-means clusters to retain. This argument is intended to "zoom in" (or "drill down") to one or more k-means clusters of interest. When both k_clusters and partition are provided, this argument must exactly match the row title as displayed in the heatmap.

k_colors

character vector of R colors, or NULL to use the output of colorjam::rainbowJam(k_clusters). These colors are applied to k_clusters and/or partition:

  • When partition is provided, names(k_colors) are used when present, otherwise colors are assigned in order of partition groups. When k_clusters is also defined, each partition color is split into a light-to-dark gradient based upon the number of k_clusters.

  • When partition is not provided, k_colors are applied to k-means clusters in the order the colors are provided.

k_width

unit width of the k-means cluster color bar, used with k_clusters, default is 5 mm width.

k_method

character string indicating the distance used by k-means, where the common default is "euclidean", however a useful alternative for sequence coverage data is "correlation" as implemented in amap::Kmeans(). Available methods:

  • "euclidean" (default) calculates the typical Euclidean distance, which tends to emphasize total signal moreso than the specific shape of the signal.

  • "correlation" when the R package amap is available, this method emphasizes the shape of signal profiles, and is particularly effective. It is also called "centered Pearson" since data is centered prior to calculating correlation.

  • "pearson" when the R package amap is available, this method is also called "not centered Pearson" since data is not centered prior to calculating correlation.

  • "spearman" when the R package amap is available, this method computes distance based upon rank differences. It has not been tested much in this context.

k_heatmap

integer with one or more values indicating which nmatlist entries to use for k-means clustering, default uses main_heatmap. This value is only used when k_clusters is greater than 1. This argument is useful for clustering multiple coverage heatmaps together.

partition

character or factor vector used to split rows of each matrix in nmatlist, and must named by rownames in nmatlist. This value is converted to factor, and will honor provided factor levels if already defined.

  • When partition and k_clusters are both defined, the data is first grouped by partition then each partition group is separately k-means clustered, using rules described for k_clusters and min_rows_per_k. Colors from k_colors are assigned to each partition value, then colors are split to light-to-dark gradient using jamba::color2gradient().

row_title_rot

numeric value in degrees, to rotate the partition labels on the left, when either partition or k_clusters are provided. The default 0 uses horizontal text. For long labels, it may be better to use 30 or 60.

partition_counts

logical indicating whether to include the number of rows in each partition, default TRUE. Note that this setting is active if k_clusters and/or partition are supplied. Any situation where rows are split, the number of rows will be displayed.

partition_count_template

character format used when partition_counts=TRUE, used together with glue::glue() to format each row partition. The default: "{partition_name}\n({counts} rows)" will print for example: "A\n(125 rows)"

rows

optional vector to define subset rows, or specific row order:

  • character vector of rownames in nmatlist, or

  • integer vector with row numbers (row index) values.

Note that even when using a subset of rows the data may also be subset based upon available names(partition) and rownames(anno_df).

row_order

integer vector used to order rows, intended to allow ordering data based upon a specific heatmap, or using different logic than the default.

  • When row_order=NULL (default) or row_order=TRUE it calls EnrichedHeatmap::enriched_score() using data from main_heatmap. When there are multiple values for main_heatmap (which is default), then scores are calculated for each matrix, then the average score is used per row.

The enriched_score() function generates a weighted score with heighest weight at the center position, with progressively lower weight working outward where the maximum distance has zero weight. The technique sorts signal which emphasizes highest enriched signal at the center of the matrix.

  • When row_order=FALSE the data is ordered in the same order they appear in nmatlist, or when anno_df and byCols are supplied, the rows in anno_df are sorted using jamba::mixedSort(anno_df, byCols=byCols) and the resulting row order is used.

nmat_colors

character vector of R colors, to colorize each heatmap.

  • When nmat_colors=NULL (default) and panel_groups is not defined, colorjam::rainbowJam() is used to assign one unique color to each heatmap panel.

  • When nmat_colors=NULL and panel_groups is defined, colorjam::rainbowJam() is used to assign one unique color to each unique panel group, and the same color is applied to each heatmap panel in each panel group.

middle_color

character R color, default middle_color="white", used as the middle color when creating a divergent color gradient. This color should usually be either "white" or "black", but sometimes can be slightly off-white or off-black to apply some distinction from the background color.

nmat_names

character vector, or NULL, optional, used as custom names for each heatmap in nmatlist. When nmat_names=NULL the signal_name values are used from each nmatlist entry attribute: attr(nmat, "signal_name")

main_heatmap

integer index to define one or more entries in nmatlist as the main heatmap used for clustering and row ordering. Note that k_heatmap will override this option when provided. By default main_heatmap=NULL will cause all heatmaps to be used for row ordering.

anno_df

data.frame or object that can be coerced to data.frame whose rownames(anno_df) must match rownames in the nmatlist data. When rownames(anno_df) does not match, this function fails with an error message.

  • Data can optionally be sorted by defining byCols.

  • When provided, data in nmatlist is automatically subsetted to the matching rownames(anno_df) also present in nmatlist.

  • When rows is also defined, the data will be subsetted by the rows and by the rownames(anno_df) present in nmatlist.

byCols

character vector of colnames(anno_df) used to sort the data.frame. This argument is passed to jamba::mixedSortDF() and follows its rules, for example prefix "-" causes the column to be sorted in reverse. Multiple columns can be sorted, in the order they are provided, and factor levels are honored for factor columns.

color_sub

accepts input in two forms:

  1. character vector of R colors named by character values

  2. list output from design2colors() where each list element is named by colnames present in anno_df, and each list value is either:

    • character vector of colors named by character value, or

    • color function as defined by circlize::colorRamp2(), which takes a numeric value and returns a character R color.

  • When values for any column in anno_df does not have colors assigned by one mechanism above, colors are assigned using colorjam::group2colors().

  • When partition is defined, colors are assigned either by matching unique partition values with names(color_sub), or with attr(color_sub, "color_sub") if present, which may contain the full set of name-color assignments when color_sub is provided as a list. Otherwise if color_sub is provided as a list each entry is compared with partition values until values can be fully matched. Failing these steps, colors are assigned to unique partition values, then if k_clusters is also supplied, the partition colors are then split by colorjam::color2gradient() across the k-means clusters for each partition.

anno_row_marks

character optional vector of rownames in nmatlist that should be labeled beside the heatmaps using ComplexHeatmap::anno_mark().

  • Note anno_row_labels can be used to supply custom labels, or one or more columns in anno_df.

  • When anno_row_labels=NULL (default) it displays the value in anno_row_marks itself.

anno_row_labels

character vector of optional labels to use when anno_row_marks is supplied.

  • When anno_row_labels=NULL (default) it uses rownames defined in 'anno_row_marks.

  • It can be a character vector of actual labels, with names that match anno_row_marks (thus rownames in nmatlist).

  • It can be a character vector with one or more colnames(anno_df), which creates labels by concatenating values across columns, delimited with space " ".

anno_row_gp

grid::gpar object used to customize the text label displayed when anno_row_marks is defined. The default fontsize 14 is intended to be larger than other default values, for legibility.

recenter_heatmap, recenter_range, recenter_invert

arguments are passed to recenter_nmatlist() to apply re-centering.

  • Note that recenter will always occur before restrand.

restrand_heatmap, restrand_range, restrand_buffer, restrand_invert

arguments are passed to restrand_nmatlist() to apply re-stranding.

  • Note that recenter will always occur before restrand.

top_annotation

HeatmapAnnotation or logical or list:

  • top_annotation=TRUE (default) uses the default EnrichedHeatmap::anno_enriched() to display the signal profile for each row partition and/or k-means cluster.

  • top_annotation=FALSE does not display a top annotation.

  • object HeatmapAnnotation as produced by ComplexHeatmap::HeatmapAnnotation(EnrichedHeatmap::anno_enriched()) or equivalent. This form is required for the annotation function to be called successfully on each heatmap in nmatlist.

  • a list of objects to be applied sequentially to each nmatlist coverage heatmap in order, intended to allow custom top annotation for each heatmap.

top_anno_height

unit object to define the default height of the top_annotation. When top_annotation is not defined, the default method uses EnrichedHeatmap::anno_enriched() with height=top_anno_height.

top_axis_side

character value indicating which side of the top annotation to place the y-axis labels.

  • When only one value is defined, it is recycled across nmatlist.

  • Otherwise it is used when panel_groups are defined, and the top annotation is labeled for only one panel in each panel group using the side as defined. Labels are displayed for each contiguous set of panel groups, so that heatmaps in the same panel group can be ordered in different subsets. Consider panel groups in this order: A, A, B, B, A, A. It would display one set of axis labels for the first two panels in A, then one axis label for the next two panels in B, then one axis label again for the final two panels in A.

  • Values should be one of:

    • "left","right": axis labels on this side of each panel group

    • "both": axis labels on both sides of each panel group, useful when panel groups have a fairly large number of panels.

    • "none": display no axis labels

    • "all": display axis labels for every panel even within panel group.

legend_max_ncol

integer number indicating the maximum number of columns allowed for a categorical color legend.

legend_base_nrow

integer number indicating the base number of rows used for a categorical color legend, before additional columns are added. Once the number of elements exceeds (legend_max_ncol * legend_base_nrow) then rows are added, but columns never exceed legend_max_ncol.

legend_max_labels

integer to define the maximum labels to display as a color legend. When any anno_df column contains more than this number of categorical colors, the legend is not displayed, in order to prevent the color legend from filling the entire plot device, thus hiding the heatmaps.

show_heatmap_legend

logical indicating whether to display the color legend for each heatmap entry in nmatlist. When panel_groups are supplied, color legends are displayed only for the first heatmap in each unique panel group, unless show_heatmap_legend=FALSE, or unless show_heatmap_legend is already defined for every heatmap.

heatmap_legend_param

list with optional heatmap legend settings. By default NULL causes this argument to be defined internally, however when provided it overrides any internal settings and is used directly. The list should be length(nmatlist), or is recycled to that length.

heatmap_legend_direction

character string used when show_heatmap_legend=TRUE and heatmap_legend_param is not already provided.

  • By default heatmap_legend_direction="horizontal" displays the color gradient in the legend horizontally as a continuous scale, with labels defined in EnrichedHeatmap::EnrichedHeatmap(), and width equal to grid::unit(1, "npc") which uses the full width of the color legend area.

  • When heatmap_legend_direction="vertical" the color legend is displayed vertically, with width grid::unit(5, "mm").

annotation_legend_param

list optional parameters passed to the annotation legend functions, intended to provide customization. The list should be named by each annotation entry to be customized, and any annotation entries not defined annotation_legend_param use the default behavior of ComplexHeatmap::HeatmapAnnotation(), which will assign its own set of colors and use default legend parameters by default. When annotation_legend_param=NULL (default) then all colors are defined, and all legends are displayed using this function defaults. When there are more labels than legend_max_labels the color legend will be hidden for that annotation legend entry.

hm_nrow

integer number of rows used to display the heatmap panels. This mechanism is somewhat experimental, and is used to split a large number of coverage heatmaps into two rows of heatmaps.

  • The matrix data row order is consistent across all heatmap panels.

  • The annotation data is displayed to the left of each row of heatmap panels.

transform

one of the following:

  • character string referring to a numeric transformation, passed to get_numeric_transform(). Commonly used strings:

    • "log2signed" calls jamba::log2signed(), which applies log2(1+x) to the absolute value, multiplied by sign(x)

    • "sqrt" applies square root to the absolute value, multiplied by the sign(x)

    • "cubert" applies cube root x^(1/3)

    • "qrt" applies fourth root x^(1/4) to the absolute value, multiplied by the sign(x)

  • function that applies a numeric transformation. Valid character string values: "log2signed" applies jamba::log2signed() which applies log2(1+x) transform to the absolute value, then multiplies by the original sign(x); "sqrt" applies square root; "cubert" applies cube root x^(1/3); "qrt" applies fourth root x^(1/4). When there are negative numeric values, the transformation is applied to absolute value, then multiplied by the original sign. Therefore, the transformation is applied to adjust the magnitude of the values. These values are passed to get_numeric_transform() which may have more information.

transform_label

character optional vector of transformation labels to use. When transform_label=NULL (default) it uses names(transform) if present, then the character string of transform, otherwise is left blank. When transform="none" no label is displayed. By default, transform labels are surrounded by parentheses, for example "(log2signed)" and placed on a new line below each coverage heatmap title. To suppress the transformation in the title, supply transform_label="".

signal_ceiling

numeric vector whose values are recycled to length length(nmatlist). The signal_ceiling defines the maximum numeric value to the color ramp for each matrix in nmatlist. The value is passed to get_nmat_ceiling(), which recognizes three numeric forms:

  1. signal_ceiling=NULL: (default) the maximum absolute value is used as the ceiling.

  2. signal_ceiling > 1: the specific numeric value is applied as a fixed ceiling, even if the value is above or below the maximum absolute value in the data matrix. This setting is useful for defining a fixed meaningful threshold across nmatlist entries.

  3. signal_ceiling > 0 and signal_ceiling <= 1: the numeric value defines a quantile threshold calculated using signal in the data matrix, excluding values of zero. For example signal_ceiling=0.75 calculates ceiling quantile(x, probs=0.75), using non-zero values.

Note that the ceiling is only applied to the color scale and not to the underlying data. The row clustering and row ordering steps use the full data range, after applying the appropriate transform where applicable.

To apply a numeric ceiling to the data itself, it should be done at the level of nmatlist beforehand.

axis_name

character string with optional custom label used for the target region label in each heatmap panel.

  • When axis_name=NULL (default), the attr(nmat, "target_name") label will be used, which is usually "target", along with the upstream and downstream length as stored in attr(nmat, "extend").

  • a character vector will be applied as the center (target) label on each heatmap, using the upstream and downstream length as stored in attr(nmat, "extend").

  • a list is expected to have three labels per vector element, corresponding to the upstream, target, and downstream axis label. This list is recycled to length(nmatlist).

axis_name_gp

object of grid::gpar applied to the x-axis label graphic parameters. For example, to customize the x-axis font size, use the form: grid::gpar(fontsize=8).

axis_name_rot

numeric value either 0 or 90 indicating whether to rotate the x-axis names below each heatmap, where axis_name_row=90 (default) will rotate labels vertically, and axis_name_row=0 will display labels horizontally.

  • Note that axis_name_rot also controls the rotation of annotation (anno_df) and partition (partition or k_clusters) annotation labels, below each annotation heatmap.

column_title_gp

object grid::gpar or list of grid::gpar objects, applied across entries in nmatlist to customize the title displayed above each heatmap panel. For example to alter the font size, use grid::gpar(fontsize=14). This argument is passed to ComplexHeatmap::Heatmap(), and can be customized for each heatmap as needed.

lens

numeric adjustment to the intensity of the color gradient, used only when the corresponding nmat_colors entry uses a fixed set of colors. lens above zero create more rapid color changes, making the gradient more visually intense, values below zero reduce the intensity. The lens values are recycled to length(nmatlist) as needed. Note that signal_ceiling defines the numeric value at which the maximum color is applied, while lens adjusts the intensity of the intermediate values in the color gradient.

anno_lens

numeric value used to scale the annotation heatmap color scales, see lens for details. This value is applied to numeric columns only when anno_df is provided.

seed

numeric value used with set.seed() to set the random seed. Set to NULL to avoid running set.seed().

ht_gap

unit size to specify the gap between multiple heatmaps. This argument is passed to ComplexHeatmap::draw(). An example is grid::unit(8, "mm") to specify 8 millimeters.

row_anno_padding, column_anno_padding, legend_padding

grid::unit to define the padding between heatmap body, and row annotation, column annotation, and heatmap color legend, respectively.

  • The default values are intended to provide more space between heatmap and these features than between heatmap subsections (row_gap). The defaults are 4mm for row and column annotations, and 1cm for the color legend.

  • The legend_padding is useful to minimize overlap with legend and the y-axis labels from the metaplots at the top of each heatmap.

profile_value

character string to define the type of numeric profile to display at the top of each heatmap. This argument is passed to EnrichedHeatmap::anno_enriched(). Values: "mean" the mean profile; "sum" the sum; "abs_sum" sum of absolute values; "abs_mean" the mean of absolute values.

profile_linetype

numeric or character default c(1, 5, 3) passed to grid::gpar(lty) to define the metaplot line type. Default lty=1 is a solid line. Values are recycled to the number of profile plots.

profile_linewidth

numeric line width passed to grid::gpar(lwd) to control the line width. Default uses lwd=1.5. Values are recycled to the number of profile plots.

ylims

numeric vector of maximum y-axis values for each heatmap profile; or list of min,max values to apply to each nmatlist entry.

border

logical indicating whether to draw a border around the heatmap, which includes all heatmap panels in the event of splitting by clustering. The border can be supplied as a vector, so the border can be applied specifically to each heatmap if needed.

iter.max

integer value indicating the maximum iterations performed by k-means clustering, only relevant when k_clusters is non-zero.

use_raster

logical indicating whether to create heatmaps using raster resizing, almost always recommended TRUE otherwise the output will be very sub-optimal.

raster_quality

logical passed to ComplexHeatmap::Heatmap(), used when use_raster=TRUE and defines the level of detail retained, and is used only when raster_by_magick=FALSE. Using larger numbers decreases speed substantially.

raster_by_magick

logical passed to ComplexHeatmap::Heatmap(), to enable ImageMagick use during rasterization. By default this option is TRUE and is only disabled when the R package "magick" is not installed, or not properly configured. If you see a warning "instalilng 'magick' will improve rasterization" then check the R package with library(magick) and see if there are error messages. When "magick" is not available, the rasterization is substantially slower, and may produce files much larger than normal.

do_plot

logical indicating whether to draw the heatmaps, do_plot=TRUE (default) renders the plots as normal. do_plot=FALSE will return the data used to create heatmaps without drawing the heatmaps.

do_caption

logical indicating whether to include a small caption at the bottom-right of the plot, describing the number of rows and columns, the partition, k-means clustering, and main heatmap.

legend_fontsize

numeric fontsize to use for all legend text, default 10.

  • Optionally two values can be defined, the first is used for legend title, the second is used for legend labels.

padding

grid::unit object used during ComplexHeatmap::draw() to add whitespace padding around the boundaries of the overall list of heatmaps. This padding is useful to enforce extra whitespace, or to prevent labels from exceeding the width of the figure.

return_type

character string indicating the type of data to return:

  • "heatmaplist" returns the list of heatmaps, which can separately be arranged together using ComplexHeatmap::draw() or grid::grid.draw().

  • "grid" returns the grid graphical object which may be easier to render using something like the patchwork or cowplot R packages.

show_error

logical indicating whether to add error bars to the profile plot at the top of each heatmap. These error bars are calculated by EnrichedHeatmap::anno_enriched() using matrixStats::colSds(x)/nrow(x).

verbose

logical indicating whether to print verbose output.

...

additional arguments are passed to EnrichedHeatmap::EnrichedHeatmap() to allow greater customization of details. Note that many ... arguments are also passed to ComplexHeatmap::Heatmap().

Details

This function takes a list of normalizedMatrix objects, usually the output of coverage_matrix2nmat(), and produces multiple heatmaps using EnrichedHeatmap.

This function is intended to be a convenient wrapper to help keep each data matrix in order, to apply consistent clustering and filtering across all data matrices, and to enable optional multi-row heatmap layout.

Value

list with heatmap components that can be reviewed, or optionally rendered into a figure:

  • "AHM": annotation heatmap, when anno_df is supplied

  • "PHM": partition heatmap, when partitioning and/or k-means clustering is used

  • "EH_l": list of ComplexHeatmap::Heatmap objects

  • "MHM": marked heatmap, containing optional row labels

  • "HM_drawn": when hm_nrow=1 this is the output after drawing the heatmap, in the form: ComplexHeatmap::HeatmapList. This object can be drawn again if needed, or used to determine exact row orders.

  • "fn_params": list of useful function parameters, including some calculated during processing such as panel_groups, ylims, signal_ceiling, etc.

  • "hm_caption": character version of heatmap captions

  • "adjust_df": data.frame when recenter_heatmap or restrand_heatmap are defined, which contains a summary of each row, with colnames: "summit_name" for recentering; and "restrand" for restranding.

Annotation Data

When anno_df is provided as a data.frame the rows are synchronized alongside the heatmap rows. Column values are color-coded, categorical for character columns, and using color gradient for numeric columns.

Rows can optionally be split by argument partition, which can be a vector of group values associated with rows, or one or more columns in colnames(anno_df) whose values are used to sub-divide the rows.

Row Clustering / Partitioning

Rows can be clustered using k-means clustering with argument k_clusters. By default it uses k_method="correlation", which applies a novel and effective correlation metric, clustering row data by the profile shape. The typical default, which is used when the amap R package is not installed, is to use "euclidean" distance, which tends to cluster based upon signal magnitude moreso than the shape.

When k-means clustering k_clusters and partition are both enabled, each partition is independently k-means clustered, which improves results compared to applying global k-means before applying partitions. Use min_rows_per_k to adjust the relative number of k clusters based upon the number of observed rows.

Display Layout

Heatmaps are arranged in the following order, dependent upon the data provided:

  • Annotation heatmap, if anno_df is provided.

    • Color assignment can be provided using color_sub either as a named vector of R colors whose names match values in each column, or as a list named by colnames(anno_df), with named color assignments, or a color function for numeric columns.

  • Partition heatmap, if partition is provided.

  • Enrichment heatmaps, one for each entry in nmatlist.

    • Above each heatmap is the metaplot, drawn using EnrichedHeatmap::anno_enriched().

    • When partition and/or k_clusters are defined, the plot will include one profile line for each row grouping.

    • When show_error=TRUE each line will also be shaded using 95% standard deviation.

    • The heatmap color gradient is applied starting at zero, extending to signal_ceiling for each heatmap. When signal_ceiling is <=1 it uses the quantile of non-zero values in the matrix data, otherwise it applies a fixed numeric maximum. Numeric values above the signal_ceiling threshold are colored using the maximum color.

    • When there are negative values, the color key uses a divergent color scale. When nmat_colors value for the heatmap is a single color, the complementary color is used for negative values; otherwise it is assumed to define a divergent color scale.

    • The y-axis range on metaplots is defined by observed values, and when panel_groups is defined, the y-axis ylim is shared among all heatmaps in each panel group.

  • Marked row heatmap, if anno_row_marks is provided. It uses an empty heatmap, associated with row mark annotations for a subset of row labels, in the same order as the coverage heatmaps.

  • Color legends are displayed in the same order:

    • annotation colors for each column in anno_df

    • partition/cluster colors

    • color gradients for each coverage heatmap in order, or when panel_groups is provided it displays the color key for only the first heatmap in each panel group.

See Also

Other jam coverage heatmap functions: coverage_matrix2nmat(), get_nmat_ceiling(), nmathm_row_order(), recenter_nmatlist(), restrand_nmatlist(), validate_heatmap_params(), zoom_nmatlist(), zoom_nmat()

Examples

## There is a small example file to use for testing
# library(jamba)
cov_file1 <- system.file("data", "tss_coverage.matrix", package="platjam");
cov_file2 <- system.file("data", "h3k4me1_coverage.matrix", package="platjam");
cov_files <- c(cov_file1, cov_file2);
names(cov_files) <- gsub("[.]matrix",
   "",
   basename(cov_files));
nmatlist <- coverage_matrix2nmat(cov_files, verbose=FALSE);
sapply(nmatlist, function(nmat){attr(nmat, "signal_name")})
nmatlist2heatmaps(nmatlist);

# sometimes data transform can be helpful
nmatlist2heatmaps(nmatlist,
   transform=c("log2signed", "sqrt"));

# k-means clusters, default uses euclidean distance
nmatlist2heatmaps(nmatlist, k_clusters=4,
   transform=c("log2signed", "sqrt"));

# k-means clusters, "correlation" or "pearson" sometimes works better
nmatlist2heatmaps(nmatlist,
   k_clusters=4,
   min_rows_per_k=20,
   k_method="pearson",
   transform=c("log2signed", "sqrt"));

# example showing usage of top_axis_side
# and panel_groups
nmatlist2 <- nmatlist[c(1, 1, 1, 2, 2, 2)];
names(nmatlist2) <- jamba::makeNames(names(nmatlist2))
for (iname in names(nmatlist2)) {
   attr(nmatlist2[[iname]], "signal_name") <- gsub("coverage", "cov", iname);
}
# top_axis_side="left"
# assumes 12x7 figure size
nmatlist2heatmaps(nmatlist2,
   signal_ceiling=0.8,
   nmat_colors=rep(c("firebrick", "tomato"), each=3),
   panel_groups=rep(c("tss", "h3k4me1"), each=3),
   ht_gap=grid::unit(4, "mm"),
   top_axis_side="left",
   transform=rep(c("log2signed", "sqrt"), each=3));

# top_axis_side="both"
nmatlist2heatmaps(nmatlist2,
   panel_groups=rep(c("tss", "h3k4me1"), each=3),
   ht_gap=grid::unit(6, "mm"),
   top_axis_side="both",
   transform=rep(c("log2signed", "sqrt"), each=3));

# multiple heatmap rows
nmatlist2heatmaps(nmatlist2,
   k_clusters=4,
   k_method="pearson",
   hm_nrow=2,
   panel_groups=rep(c("tss", "h3k4me1"), each=3),
   ht_gap=grid::unit(6, "mm"),
   top_axis_side="both",
   top_anno_height=grid::unit(0.8, "cm"),
   transform=rep(c("log2signed", "sqrt"), each=3));

# invent anno_df data.frame of additional annotations
anno_df <- data.frame(
   tss_score=EnrichedHeatmap::enriched_score(jamba::log2signed(nmatlist[[1]])),
   h3k4me1_score=EnrichedHeatmap::enriched_score(jamba::log2signed(nmatlist[[2]])),
   chromosome=paste0("chr", sample(1:4, replace=TRUE, size=nrow(nmatlist[[1]])))
);
rownames(anno_df) <- rownames(nmatlist[[1]]);
nmatlist2heatmaps(nmatlist,
   title="k-means clustering across both heatmaps",
   k_clusters=4,
   k_method="pearson",
   k_heatmap=c(1, 2),
   ht_gap=grid::unit(6, "mm"),
   top_axis_side="left",
   anno_df=anno_df,
   transform=rep(c("log2signed", "sqrt"), each=3));

# example showing k-means clustering together with annotation groups
anno_df <- data.frame(
   group=sample(c(1, -1, -1),
      size=nrow(nmatlist[[1]]),
      replace=TRUE),
   row.names=rownames(nmatlist[[1]]))
# note for this example the color legends are oriented vertically
# showing how the width is adjusted
nmatlist2heatmaps(nmatlist,
   heatmap_legend_direction="vertical",
   k_clusters=0,
   color_sub=c(`A`="firebrick", `B`="darkorchid"),
   k_colors=c("firebrick", "dodgerblue"),
   min_rows_per_k=50,
   ht_gap=grid::unit(1, "cm"),
   k_method="correlation",
   k_heatmap=1:2,
   anno_df=anno_df,
   partition="group",
   row_title_rot=0,
   transform=rep(c("log2signed", "sqrt"), each=3));

# same as above, partition and k_clusters together
# except uses multiple values for k_clusters
nmatlist2heatmaps(nmatlist,
   k_clusters=c(1, 4),
   min_rows_per_k=25,
   k_heatmap=1:2,
   k_method="correlation",
   anno_df=anno_df,
   partition="group",
   row_title_rot=0)


jmw86069/platjam documentation built on Sept. 26, 2024, 3:31 p.m.