dittoDotPlot: Compact plotting of per group summaries for expression of...
In dtm2451/DittoSeq: User Friendly Single-Cell and Bulk RNA Sequencing Visualization

dittoDotPlot

R Documentation

Compact plotting of per group summaries for expression of multiple features

Description

Compact plotting of per group summaries for expression of multiple features

Usage

dittoDotPlot(
  object,
  vars,
  group.by,
  scale = TRUE,
  split.by = NULL,
  cells.use = NULL,
  size = 6,
  vars.dir = c("x", "y"),
  categories.split.adjust = TRUE,
  categories.theme.adjust = TRUE,
  split.nrow = NULL,
  split.ncol = NULL,
  split.adjust = list(),
  min.color = "grey90",
  max.color = "#C51B7D",
  min = "make",
  max = NA,
  mid.color = NULL,
  mid = "make",
  summary.fxn.color = function(x) {
     mean(x[x != 0])
 },
  summary.fxn.size = function(x) {
     mean(x != 0)
 },
  min.percent = 0.01,
  max.percent = NA,
  assay = .default_assay(object),
  slot = .default_slot(object),
  adjustment = NULL,
  swap.rownames = NULL,
  do.hover = FALSE,
  main = NULL,
  sub = NULL,
  ylab = group.by,
  y.labels = NULL,
  y.reorder = NULL,
  xlab = NULL,
  x.labels.rotate = vars.dir == "x",
  groupings.drop.unused = TRUE,
  theme = theme_classic(),
  legend.show = TRUE,
  legend.color.breaks = waiver(),
  legend.color.breaks.labels = waiver(),
  legend.color.title = "make",
  legend.size.title = "percent\nexpression",
  data.out = FALSE
)

Arguments

`object`	A Seurat, SingleCellExperiment, or SummarizedExperiment object.
`vars`	String vector of gene or metadata names which selects the features to summarize and show. Example: `c("gene1","gene2","gene3")` Alternatively, a named list of string vectors where names represent category labels, such as associated cell types, and values are the gene or metadata names that you wish to have grouped together. Example: `vars = list('Epithelial Cells' = c("gene1","gene2"), Neuron = c("gene3"))`
`group.by`	String representing the name of a metadata to use for separating the cells/samples into discrete groups.
`scale`	String which sets whether the values shown with color (default: mean non-zero expression) should be centered and scaled.
`split.by`	1 or 2 strings naming discrete metadata to use for splitting the cells/samples into multiple plots with ggplot faceting. When 2 metadata are named, c(row,col), the first is used as rows and the second is used for columns of the resulting grid. When 1 metadata is named, shape control can be achieved with `split.nrow` and `split.ncol` Note: When `vars` are provided in list format, to group its contents into categories, that grouping is carried out via faceting and takes up one of the `split.by` slots.
`cells.use`	String vector of cells'/samples' names OR an integer vector specifying the indices of cells/samples which should be included. Alternatively, a Logical vector, the same length as the number of cells in the object, which sets which cells to include.
`size`	Number which sets the visual dot size associated with the highest value shown by dot size (default: percent non-zero expression).
`vars.dir`	"x" or "y", sets the axis where `vars` will be displayed.
`categories.split.adjust`	Boolean. When `TRUE` (default), and `vars`-categories have been provided, improves category display by: adding `list(switch = "y", scales = "free_y", space = "free_y")` to the default for `split.adjust` (or 'x' counterparts depending on `vars.dir`) enforcing that `facet_grid` will be used for faceting because `facet_wrap` cannot receive the 'space' argument.
`categories.theme.adjust`	Boolean. When `TRUE` (default), and `vars`-categories have been provided, improves category display by adding `theme(strip.placement = "outside", strip.background.y = element_blank())` to the given `theme` (or 'x' counterpart depending on `vars.dir`)
`split.nrow`, `split.ncol`	Integers which set the dimensions of faceting/splitting when a single metadata is given to `split.by`.
`split.adjust`	A named list which allows extra parameters to be pushed through to the faceting function call. List elements should be valid inputs to the faceting functions, e.g. 'list(scales = "free")'. For options, when giving 1 metadata to `split.by`, see `facet_wrap`, OR when giving 2 metadatas to `split.by`, see `facet_grid`.
`min.color`, `max.color`	colors to use for minimum and maximum color values. Default = light grey and purple. Ignored if `mid.color` given as `"ryb"`, `"rwb"`, or `"rgb"` which will update these to be "blue" and "red", respectively.
`min`, `max`	Numbers which set the values associated with the minimum and maximum colors.
`mid.color`	NULL (default), "ryb", "rwb", "rgb", or a color to use for the midpoint of a three-color color scale. This parameter acts a switch between using a 2-color scale or a 3-color scale: When left NULL, the 2-color scale runs from `min.color` to `max.color`, using `scale_fill_gradient`. When given a color, the 3-color scale runs from `min.color` to `mid.color` to `max.color`, using `scale_fill_gradient2`. When given `"ryb"`, `"rwb"`, or `"rgb"` serves as a single-point, quick switch to a "standard" 3-color scale by also updating the `min.color` and `max.color`. Doing so sets: `max.color` to a red, `min.color` to a blue, and `mid.color` to either a yellow ("ryb"), "white" ("rwb"), or "gray97" ("rgb", gray not green). Actual colors used are inspired by ColorBrewer "RdYlBu" and "RdBu" palettes. Thus, the 3-color scale runs from a blue to one of a yellow, "white", or "gray97" to a red, using `scale_fill_gradient2`.
`mid`	Number or "make" (default) which sets the value associated with the `mid.color` of the three-color scale. Ignored when `mid.color` is left as NULL. When "make", defaults to midway between what dittoSeq expects to be the minimum and maximum values shown in the legend. (Maps to the 'midpoint' parameter of `scale_fill_gradient2`.)
`summary.fxn.color`, `summary.fxn.size`	A function which sets how color or size will be used to summarize variables' data for each group. Any function can be used as long as it takes in a numeric vector and returns a single numeric value.
`min.percent`, `max.percent`	Numbers between 0 and 1 which sets the minimum and maximum percent expression to show. When set to NA, the minimum/maximum of the data are used.
`assay`, `slot`	single strings or integers (SCEs and SEs) or an optionally named vector of such values that set which expression data to use. See `GeneTargeting` for specifics and examples – Seurat and SingleCellExperiment objects deal with these differently, and functionality additions in dittoSeq have led to some minimal divergence from the native methodologies.
`adjustment`	Should expression data be used directly (default) or should it be adjusted to be "z-score": scaled with the scale() function to produce a relative-to-mean z-score representation "relative.to.max": divided by the maximum expression value to give percent of max values between [0,1]
`swap.rownames`	optionally named string or string vector. For SummarizedExperiment or SingleCellExperiment objects, its value(s) specifies the column name of rowData(object) to be used to identify features instead of rownames(object). When targeting multiple modalities (alternative experiments), names can be used to specify which level / alternative experiment (use 'main' for the top-level) individual values should be used for. See `GeneTargeting` for more specifics and examples.
`do.hover`	Logical. Default = `FALSE`. If set to `TRUE` the object will be converted to an interactive plotly object in which underlying data for individual dots will be displayed when you hover your cursor over them.
`main`	String which sets the plot title.
`sub`	String which sets the plot subtitle.
`ylab`	String which sets the y/grouping-axis label. Default is `group.by` so it defaults to the name of the grouping information. Set to `NULL` to remove.
`y.labels`	String vector, c("label1","label2","label3",...) which overrides the names of the samples/groups.
`y.reorder`	Integer vector. A sequence of numbers, from 1 to the number of groupings, for rearranging the order of groupings. Method: Make a first plot without this input. Then, treating the bottom-most grouping as index 1, and the top-most as index n, values of y.reorder should be these indices, but in the order that you would like them rearranged to be. Recommendation for advanced users: If you find yourself coming back to this input too many times, an alternative solution that can be easier long-term is to make the target data into a factor, and to put its levels in the desired order: `factor(data, levels = c("level1", "level2", ...))`. `metaLevels` can be used to quickly get the identities that need to be part of this 'levels' input.
`xlab`	String which sets the x/var-axis label. Set to `NULL` to remove.
`x.labels.rotate`	Logical which sets whether the var-labels should be rotated.
`groupings.drop.unused`	Logical. `TRUE` by default. If `group.by`-data is a factor, factor levels are retained for ordering purposes, but some level(s) can end up with zero cells left after `cells.use` subsetting. By default, we remove them, but you can set this input to `FALSE` to keep them.
`theme`	A ggplot theme which will be applied before dittoSeq adjustments. Default = `theme_classic()`. See https://ggplot2.tidyverse.org/reference/ggtheme.html for other options and ideas.
`legend.show`	Logical. Whether the legend should be displayed. Default = `TRUE`.
`legend.color.breaks`	Numeric vector which sets the discrete values to label in the color-scale legend for continuous data.
`legend.color.breaks.labels`	String vector, with same length as `legend.breaks`, which sets the labels for the tick marks of the color-scale.
`legend.color.title`, `legend.size.title`	String or `NULL`, sets the title displayed above legend keys.
`data.out`	Logical. When set to `TRUE`, changes the output, from the plot alone, to a list containing the plot (`p`) and data (`data`).

Details

This function will output a compact summary of expression of multiple genes, or of values of multiple numeric metadata, across cell/sample groups (clusters, sample identity, conditions, etc.), where dot-size and dot-color are used to reflect distinct features of the data. Typically, and by default, size will reflect the percent of non-zero values, and color will reflect the mean of non-zero values for each var and group pairing.

Internally, the data for each element of vars is obtained. When elements are genes/features, assay and slot are utilized to determine which expression data to use, and adjustment determines if and how the expression data might be adjusted. (Note that 'adjustment' would be applied before cells/samples subsetting, and across all groups of cells/samples.)

Groupings are determined using group.by, and then data for each variable is summarized based on summary.fxn.color & summary.fxn.size.

If scale = TRUE (default setting), the color summary values are centered and scaled. Doing so 1) puts values for all vars in a similar range, and 2) emphasizes relative differences between groups.

Finally, data is plotted as dots of differing colors and sizes, with vars along the vars.dir-axis and groupings along the other. Labels along the x-axis can be rotated 45 degrees with x.label.rotate=TRUE, which is on by default when vars.dir=='x'.

Value

a ggplot object where dots of different colors and sizes summarize continuous data for multiple features per multiple groups.

Alternatively when data.out = TRUE, a list containing the plot ("p") and the underlying data as a dataframe ("data").

Alternatively when do.hover = TRUE, a plotly converted version of the plot where additional data will be displayed when the cursor is hovered over the dots.

Many characteristics of the plot can be adjusted using discrete inputs

Size of the dots can be changed with size.
Subsetting to utilize only certain cells/samples can be achieved with cells.use.
Markers can be grouped into categories by providing them to the vars input as a list, where list element names represent category names, and list element contents are the feature names which each category should contain.
Colors (2-color scale) can be adjusted with min.color and max.color.
Coloring can also be switched to a 3-color scale by using the mid.color parameter. For details, see that parameter's description above.
Displayed value ranges can be adjusted with min and max for color, or min.percent and max.percent for size.
Titles and axes labels can be adjusted with main, sub, xlab, ylab, legend.color.title, and legend.size.title arguments.
The legend can be hidden by setting legend.show = FALSE.
The color legend tick marks and associated labels can be adjusted with legend.color.breaks and legend.color.breaks.labels, respectively.
The groupings labels and order can be changed using y.labels and y.reorder
Rotation of x-axis labels can be turned off with x.labels.rotate = FALSE.

Author(s)

Daniel Bunis

Examples

example(importDittoBulk, echo = FALSE)
myRNA

# These random data don't mimic dropout, so we'll add some zeros.
logcounts(myRNA)[
    matrix(
        sample(c(TRUE,FALSE), ncol(myRNA)*10, p=c(.2,.8), replace = TRUE),
        ncol=10
    )] <- 0

dittoDotPlot(
    myRNA, c("gene1", "gene2", "gene3", "gene4"),
    group.by = "clustering")
    
# 'size' adjusts the dot-size associated with the highest percent expression
dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering",
    size = 12)

# 'scale' input can be used to control / turn off scaling of avg exp values.
dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering",
    scale = FALSE)
    
# x-axis label rotation can be controlled with 'x.labels.rotate'
dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering",
    x.labels.rotate = FALSE)

# The axis that vars get shown on can be swapped with the 'vars.dir' input.
dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering",
    vars.dir = "y")

# Titles are adjustable via various discrete inputs:
dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering",
    main = "Title",
    sub = "Subtitle",
    ylab = "y-axis label",
    xlab = "x-axis label",
    legend.color.title = "Colors title",
    legend.size.title = "Dot size title")

# You can also bin vars into groups by providing them in a named list:
dittoDotPlot(myRNA, group.by = "clustering",
    vars = list(
        'Naive' = c("gene1", "gene2"),
        'Stimulated' = c("gene3", "gene4")
    )
)
# The 'categories.split.adjust' and 'categories.theme.adjust' arguments then
#   control whether 'split.adjust' and 'theme' input contents, respectively,
#   will be added to in ways that make these categories actually appear, and
#   work, like categories.
# They both default to TRUE, and the axis they affect follows 'vars.dir'.
dittoDotPlot(myRNA, group.by = "clustering",
    vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3"))
)
dittoDotPlot(myRNA, group.by = "clustering",
    vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3")),
    split.by = "conditions"
)
dittoDotPlot(myRNA, group.by = "clustering",
    vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3")),
    categories.split.adjust = FALSE,
    categories.theme.adjust = FALSE
)
# Now with 'vars.dir' changed to 'y'...
dittoDotPlot(myRNA, group.by = "clustering",
    vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3")),
    vars.dir = "y"
)
dittoDotPlot(myRNA, group.by = "clustering",
    vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3")),
    split.by = "conditions",
    vars.dir = "y"
)

# Coloring can be swapped from the default 2-color scale to a 3-color scale
#   by using the 'mid.color' input:
dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering",
    mid.color = "white"
)
# Setting it to "ryb", "rgb", or "rwb" quickly updates this input as well as
#   'min.color' and 'max.color', making the affect of these next two calls
#   equivalent:
dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering",
    mid.color = "rgb"
)
dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering",
    min.color = "#2166AC", # (blue)
    mid.color = "gray97",  # (gray)
    max.color = "#B2182B"  # (red)
)

# For certain specialized applications, it may be helpful to adjust the
#   functions used for summarizing the data as well. Inputs are:
#   summary.fxn.color & summary.fxn.size
#     Requirement for each: Any function that takes in a numeric vector &
#     returns, as output, a single numeric value.
dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering",
    summary.fxn.color = mean,
    legend.color.title = "mean\nexpression\nincluding 0s",
    x.labels.rotate = FALSE,
    scale = FALSE)

dtm2451/DittoSeq documentation built on May 4, 2024, 7:31 a.m.