dittoDotPlot: Compact plotting of per group summaries for expression of...

View source: R/dittoDotPlot.R

dittoDotPlotR Documentation

Compact plotting of per group summaries for expression of multiple features

Description

Compact plotting of per group summaries for expression of multiple features

Usage

dittoDotPlot(
  object,
  vars,
  group.by,
  scale = TRUE,
  split.by = NULL,
  cells.use = NULL,
  size = 6,
  vars.dir = c("x", "y"),
  categories.split.adjust = TRUE,
  categories.theme.adjust = TRUE,
  split.nrow = NULL,
  split.ncol = NULL,
  split.adjust = list(),
  min.color = "grey90",
  max.color = "#C51B7D",
  min = "make",
  max = NA,
  mid.color = NULL,
  mid = "make",
  summary.fxn.color = function(x) {
     mean(x[x != 0])
 },
  summary.fxn.size = function(x) {
     mean(x != 0)
 },
  min.percent = 0.01,
  max.percent = NA,
  assay = .default_assay(object),
  slot = .default_slot(object),
  adjustment = NULL,
  swap.rownames = NULL,
  do.hover = FALSE,
  main = NULL,
  sub = NULL,
  ylab = group.by,
  y.labels = NULL,
  y.reorder = NULL,
  xlab = NULL,
  x.labels.rotate = vars.dir == "x",
  groupings.drop.unused = TRUE,
  theme = theme_classic(),
  legend.show = TRUE,
  legend.color.breaks = waiver(),
  legend.color.breaks.labels = waiver(),
  legend.color.title = "make",
  legend.size.title = "percent\nexpression",
  data.out = FALSE
)

Arguments

object

A Seurat, SingleCellExperiment, or SummarizedExperiment object.

vars

String vector of gene or metadata names which selects the features to summarize and show. Example: c("gene1","gene2","gene3")

Alternatively, a named list of string vectors where names represent category labels, such as associated cell types, and values are the gene or metadata names that you wish to have grouped together. Example: vars = list('Epithelial Cells' = c("gene1","gene2"), Neuron = c("gene3"))

group.by

String representing the name of a metadata to use for separating the cells/samples into discrete groups.

scale

String which sets whether the values shown with color (default: mean non-zero expression) should be centered and scaled.

split.by

1 or 2 strings naming discrete metadata to use for splitting the cells/samples into multiple plots with ggplot faceting.

  • When 2 metadata are named, c(row,col), the first is used as rows and the second is used for columns of the resulting grid.

  • When 1 metadata is named, shape control can be achieved with split.nrow and split.ncol

  • Note: When vars are provided in list format, to group its contents into categories, that grouping is carried out via faceting and takes up one of the split.by slots.

cells.use

String vector of cells'/samples' names OR an integer vector specifying the indices of cells/samples which should be included.

Alternatively, a Logical vector, the same length as the number of cells in the object, which sets which cells to include.

size

Number which sets the visual dot size associated with the highest value shown by dot size (default: percent non-zero expression).

vars.dir

"x" or "y", sets the axis where vars will be displayed.

categories.split.adjust

Boolean. When TRUE (default), and vars-categories have been provided, improves category display by:

  • adding list(switch = "y", scales = "free_y", space = "free_y") to the default for split.adjust (or 'x' counterparts depending on vars.dir)

  • enforcing that facet_grid will be used for faceting because facet_wrap cannot receive the 'space' argument.

categories.theme.adjust

Boolean. When TRUE (default), and vars-categories have been provided, improves category display by adding theme(strip.placement = "outside", strip.background.y = element_blank()) to the given theme (or 'x' counterpart depending on vars.dir)

split.nrow, split.ncol

Integers which set the dimensions of faceting/splitting when a single metadata is given to split.by.

split.adjust

A named list which allows extra parameters to be pushed through to the faceting function call. List elements should be valid inputs to the faceting functions, e.g. 'list(scales = "free")'.

For options, when giving 1 metadata to split.by, see facet_wrap, OR when giving 2 metadatas to split.by, see facet_grid.

min.color, max.color

colors to use for minimum and maximum color values. Default = light grey and purple. Ignored if mid.color given as "ryb", "rwb", or "rgb" which will update these to be "blue" and "red", respectively.

min, max

Numbers which set the values associated with the minimum and maximum colors.

mid.color

NULL (default), "ryb", "rwb", "rgb", or a color to use for the midpoint of a three-color color scale. This parameter acts a switch between using a 2-color scale or a 3-color scale:

  • When left NULL, the 2-color scale runs from min.color to max.color, using scale_fill_gradient.

  • When given a color, the 3-color scale runs from min.color to mid.color to max.color, using scale_fill_gradient2.

  • When given "ryb", "rwb", or "rgb" serves as a single-point, quick switch to a "standard" 3-color scale by also updating the min.color and max.color. Doing so sets:

    • max.color to a red,

    • min.color to a blue,

    • and mid.color to either a yellow ("ryb"), "white" ("rwb"), or "gray97" ("rgb", gray not green).

    • Actual colors used are inspired by ColorBrewer "RdYlBu" and "RdBu" palettes.

    Thus, the 3-color scale runs from a blue to one of a yellow, "white", or "gray97" to a red, using scale_fill_gradient2.

mid

Number or "make" (default) which sets the value associated with the mid.color of the three-color scale. Ignored when mid.color is left as NULL. When "make", defaults to midway between what dittoSeq expects to be the minimum and maximum values shown in the legend. (Maps to the 'midpoint' parameter of scale_fill_gradient2.)

summary.fxn.color, summary.fxn.size

A function which sets how color or size will be used to summarize variables' data for each group. Any function can be used as long as it takes in a numeric vector and returns a single numeric value.

min.percent, max.percent

Numbers between 0 and 1 which sets the minimum and maximum percent expression to show. When set to NA, the minimum/maximum of the data are used.

assay, slot

single strings or integers (SCEs and SEs) or an optionally named vector of such values that set which expression data to use. See GeneTargeting for specifics and examples – Seurat and SingleCellExperiment objects deal with these differently, and functionality additions in dittoSeq have led to some minimal divergence from the native methodologies.

adjustment

Should expression data be used directly (default) or should it be adjusted to be

  • "z-score": scaled with the scale() function to produce a relative-to-mean z-score representation

  • "relative.to.max": divided by the maximum expression value to give percent of max values between [0,1]

swap.rownames

optionally named string or string vector. For SummarizedExperiment or SingleCellExperiment objects, its value(s) specifies the column name of rowData(object) to be used to identify features instead of rownames(object). When targeting multiple modalities (alternative experiments), names can be used to specify which level / alternative experiment (use 'main' for the top-level) individual values should be used for. See GeneTargeting for more specifics and examples.

do.hover

Logical. Default = FALSE. If set to TRUE the object will be converted to an interactive plotly object in which underlying data for individual dots will be displayed when you hover your cursor over them.

main

String which sets the plot title.

sub

String which sets the plot subtitle.

ylab

String which sets the y/grouping-axis label. Default is group.by so it defaults to the name of the grouping information. Set to NULL to remove.

y.labels

String vector, c("label1","label2","label3",...) which overrides the names of the samples/groups.

y.reorder

Integer vector. A sequence of numbers, from 1 to the number of groupings, for rearranging the order of groupings.

Method: Make a first plot without this input. Then, treating the bottom-most grouping as index 1, and the top-most as index n, values of y.reorder should be these indices, but in the order that you would like them rearranged to be.

Recommendation for advanced users: If you find yourself coming back to this input too many times, an alternative solution that can be easier long-term is to make the target data into a factor, and to put its levels in the desired order: factor(data, levels = c("level1", "level2", ...)). metaLevels can be used to quickly get the identities that need to be part of this 'levels' input.

xlab

String which sets the x/var-axis label. Set to NULL to remove.

x.labels.rotate

Logical which sets whether the var-labels should be rotated.

groupings.drop.unused

Logical. TRUE by default. If group.by-data is a factor, factor levels are retained for ordering purposes, but some level(s) can end up with zero cells left after cells.use subsetting. By default, we remove them, but you can set this input to FALSE to keep them.

theme

A ggplot theme which will be applied before dittoSeq adjustments. Default = theme_classic(). See https://ggplot2.tidyverse.org/reference/ggtheme.html for other options and ideas.

legend.show

Logical. Whether the legend should be displayed. Default = TRUE.

legend.color.breaks

Numeric vector which sets the discrete values to label in the color-scale legend for continuous data.

legend.color.breaks.labels

String vector, with same length as legend.breaks, which sets the labels for the tick marks of the color-scale.

legend.color.title, legend.size.title

String or NULL, sets the title displayed above legend keys.

data.out

Logical. When set to TRUE, changes the output, from the plot alone, to a list containing the plot (p) and data (data).

Details

This function will output a compact summary of expression of multiple genes, or of values of multiple numeric metadata, across cell/sample groups (clusters, sample identity, conditions, etc.), where dot-size and dot-color are used to reflect distinct features of the data. Typically, and by default, size will reflect the percent of non-zero values, and color will reflect the mean of non-zero values for each var and group pairing.

Internally, the data for each element of vars is obtained. When elements are genes/features, assay and slot are utilized to determine which expression data to use, and adjustment determines if and how the expression data might be adjusted. (Note that 'adjustment' would be applied before cells/samples subsetting, and across all groups of cells/samples.)

Groupings are determined using group.by, and then data for each variable is summarized based on summary.fxn.color & summary.fxn.size.

If scale = TRUE (default setting), the color summary values are centered and scaled. Doing so 1) puts values for all vars in a similar range, and 2) emphasizes relative differences between groups.

Finally, data is plotted as dots of differing colors and sizes, with vars along the vars.dir-axis and groupings along the other. Labels along the x-axis can be rotated 45 degrees with x.label.rotate=TRUE, which is on by default when vars.dir=='x'.

Value

a ggplot object where dots of different colors and sizes summarize continuous data for multiple features per multiple groups.

Alternatively when data.out = TRUE, a list containing the plot ("p") and the underlying data as a dataframe ("data").

Alternatively when do.hover = TRUE, a plotly converted version of the plot where additional data will be displayed when the cursor is hovered over the dots.

Many characteristics of the plot can be adjusted using discrete inputs

  • Size of the dots can be changed with size.

  • Subsetting to utilize only certain cells/samples can be achieved with cells.use.

  • Markers can be grouped into categories by providing them to the vars input as a list, where list element names represent category names, and list element contents are the feature names which each category should contain.

  • Colors (2-color scale) can be adjusted with min.color and max.color.

  • Coloring can also be switched to a 3-color scale by using the mid.color parameter. For details, see that parameter's description above.

  • Displayed value ranges can be adjusted with min and max for color, or min.percent and max.percent for size.

  • Titles and axes labels can be adjusted with main, sub, xlab, ylab, legend.color.title, and legend.size.title arguments.

  • The legend can be hidden by setting legend.show = FALSE.

  • The color legend tick marks and associated labels can be adjusted with legend.color.breaks and legend.color.breaks.labels, respectively.

  • The groupings labels and order can be changed using y.labels and y.reorder

  • Rotation of x-axis labels can be turned off with x.labels.rotate = FALSE.

Author(s)

Daniel Bunis

See Also

dittoPlotVarsAcrossGroups for a different method of summarizing expression of multiple features across distinct groups that can be better (and more compact) when the mapping of values to individual genes among the requested set are unimportant.

dittoPlot and multi_dittoPlot for plotting of expression and metadata vars, each as separate plots, on a per cell/sample basis.

Examples

example(importDittoBulk, echo = FALSE)
myRNA

# These random data don't mimic dropout, so we'll add some zeros.
logcounts(myRNA)[
    matrix(
        sample(c(TRUE,FALSE), ncol(myRNA)*10, p=c(.2,.8), replace = TRUE),
        ncol=10
    )] <- 0

dittoDotPlot(
    myRNA, c("gene1", "gene2", "gene3", "gene4"),
    group.by = "clustering")
    
# 'size' adjusts the dot-size associated with the highest percent expression
dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering",
    size = 12)

# 'scale' input can be used to control / turn off scaling of avg exp values.
dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering",
    scale = FALSE)
    
# x-axis label rotation can be controlled with 'x.labels.rotate'
dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering",
    x.labels.rotate = FALSE)

# The axis that vars get shown on can be swapped with the 'vars.dir' input.
dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering",
    vars.dir = "y")

# Titles are adjustable via various discrete inputs:
dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering",
    main = "Title",
    sub = "Subtitle",
    ylab = "y-axis label",
    xlab = "x-axis label",
    legend.color.title = "Colors title",
    legend.size.title = "Dot size title")

# You can also bin vars into groups by providing them in a named list:
dittoDotPlot(myRNA, group.by = "clustering",
    vars = list(
        'Naive' = c("gene1", "gene2"),
        'Stimulated' = c("gene3", "gene4")
    )
)
# The 'categories.split.adjust' and 'categories.theme.adjust' arguments then
#   control whether 'split.adjust' and 'theme' input contents, respectively,
#   will be added to in ways that make these categories actually appear, and
#   work, like categories.
# They both default to TRUE, and the axis they affect follows 'vars.dir'.
dittoDotPlot(myRNA, group.by = "clustering",
    vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3"))
)
dittoDotPlot(myRNA, group.by = "clustering",
    vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3")),
    split.by = "conditions"
)
dittoDotPlot(myRNA, group.by = "clustering",
    vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3")),
    categories.split.adjust = FALSE,
    categories.theme.adjust = FALSE
)
# Now with 'vars.dir' changed to 'y'...
dittoDotPlot(myRNA, group.by = "clustering",
    vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3")),
    vars.dir = "y"
)
dittoDotPlot(myRNA, group.by = "clustering",
    vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3")),
    split.by = "conditions",
    vars.dir = "y"
)

# Coloring can be swapped from the default 2-color scale to a 3-color scale
#   by using the 'mid.color' input:
dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering",
    mid.color = "white"
)
# Setting it to "ryb", "rgb", or "rwb" quickly updates this input as well as
#   'min.color' and 'max.color', making the affect of these next two calls
#   equivalent:
dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering",
    mid.color = "rgb"
)
dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering",
    min.color = "#2166AC", # (blue)
    mid.color = "gray97",  # (gray)
    max.color = "#B2182B"  # (red)
)

# For certain specialized applications, it may be helpful to adjust the
#   functions used for summarizing the data as well. Inputs are:
#   summary.fxn.color & summary.fxn.size
#     Requirement for each: Any function that takes in a numeric vector &
#     returns, as output, a single numeric value.
dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering",
    summary.fxn.color = mean,
    legend.color.title = "mean\nexpression\nincluding 0s",
    x.labels.rotate = FALSE,
    scale = FALSE)


dtm2451/DittoSeq documentation built on May 4, 2024, 7:31 a.m.