plot_frequency: Plot frequency of cell groups present in object

View source: R/calc-frequency.R

plot_frequencyR Documentation

Plot frequency of cell groups present in object

Description

Plot the frequency of each cell label present in the provided meta.data column. This is useful for comparing the proportion of cells belonging to different samples, cell types, clonotypes, isotypes, etc.

Usage

plot_frequency(
  input,
  data_col,
  cluster_col = NULL,
  group_col = NULL,
  method = "bar",
  stack = NULL,
  units = "percent",
  top = NULL,
  other_label = "other",
  plot_colors = NULL,
  plot_lvls = NULL,
  na_color = "grey80",
  trans = "identity",
  show_points = TRUE,
  show_zeros = TRUE,
  n_label = NULL,
  p_label = c(value = 0.05),
  p_method = NULL,
  p_file = NULL,
  label_params = list(),
  ...,
  per_chain = FALSE,
  chain = NULL,
  chain_col = global$chain_col,
  sep = global$sep
)

Arguments

input

Single cell object or data.frame containing V(D)J data. If a data.frame is provided, the cell barcodes should be stored as row names.

data_col

meta.data column containing cell labels to use for calculating frequency, e.g. cell types, isotypes etc.

cluster_col

meta.data column containing cluster IDs (or patients, treatment conditions, etc.) to use when calculating frequency. Calculations will be performed separately for each cluster.

group_col

meta.data column to use for grouping cluster IDs present in cluster_col. This is useful when there are multiple replicates or patients for each treatment condition.

method

Method to use for plotting when group_col is provided, possible values are 'bar' or 'boxplot'

stack

If TRUE, stacked bargraphs will be generated, otherwise grouped bargraphs will be generated

units

Units to plot on the y-axis, either 'frequency' or 'percent'

top

To only show the top cell groups present in data_col, provide one of the following, all other cells will be labeled using the value provided to the other_label argument. If NULL this will be automatically set.

  • Integer specifying the number of top groups to show

  • Vector specifying the names of cell groups to show

other_label

Label to use for 'other' cells when top is specified, if NULL all cell groups present in data_col will be displayed on the plot.

plot_colors

Character vector containing colors for plotting

plot_lvls

Levels to use for ordering clusters or groups

na_color

Color to use for missing values

trans

Transformation to use for plotting data, e.g. 'log10'. By default values are not transformed, refer to ggplot2::continuous_scale() for more options. Values can only be transformed when stack is FALSE

show_points

If TRUE data points will be shown on boxplots, the point size can be adjusted using the point.size parameter

show_zeros

If TRUE cell labels that are missing from a cluster will still be shown on the plot

n_label

Location on plot where n label should be added, this can be any combination of the following:

  • 'corner', display the total number of cells plotted in the top right corner, the position of the label can be modified by passing x and y specifications with the label_params argument

  • 'axis', display the number of cells plotted for each group shown on the x-axis

  • 'legend', display the number of cells plotted for each group shown in the plot legend

  • 'none', do not display the number of cells plotted

p_label

Specification indicating how p-values should be labeled on plot, this can one of the following:

  • 'none', do not display p-values

  • 'all', show p-values for all groups

  • A named vector providing p-value cutoffs and labels to display, e.g. c('*' = 0.05, '**' = 0.01, '***' = 0.001). The keyword 'value' can be used to display the p-value for those less than a certain cutoff, e.g. c(value = 0.05, ns = 1.1) will show significant p-values, all others will be labeled 'ns'.

p_method

Method to use for calculating p-values, by default when comparing two groups a t-test will be used. When comparing more than two groups the Kruskal-Wallis test will be used. With the exception of the edgeR method, p-values are adjusted for multiple testing using Bonferroni correction. Possible methods include:

  • 't', two sample t-test performed with stats::t.test()

  • 'wilcox', Wilcoxon rank sum test performed with stats::wilcox.test()

  • 'kruskal', Kruskal-Wallis test performed with stats::kruskal.test()

  • 'edgeR', differential abundance calculated with the edgeR package

p_file

File path to save table containing p-values for each comparison.

label_params

Named list providing additional parameters to modify n label aesthetics, e.g. list(size = 4, color = "red")

...

Additional arguments to pass to ggplot2, e.g. color, fill, size, linetype, etc.

per_chain

If TRUE the frequency of each per-chain value will be calculated. If FALSE per-chain data will not be parsed and the values present in data_col will be used as is.

chain

Chain(s) to use for calculating frequency. Set to NULL to include all chains.

chain_col

meta.data column(s) containing chains for each cell

sep

Separator for storing per-chain data

Value

ggplot object

See Also

calc_frequency(), plot_clone_frequency()

Examples

# Plot frequency of different isotypes
plot_frequency(
  vdj_sce,
  data_col = "isotype"
)

# Plot frequency separately for cell clusters
plot_frequency(
  vdj_sce,
  data_col    = "isotype",
  cluster_col = "orig.ident"
)

# Create grouped bar graphs
plot_frequency(
  vdj_sce,
  data_col    = "isotype",
  cluster_col = "orig.ident",
  stack       = FALSE
)

# Plot number of cells on the y-axis
plot_frequency(
  vdj_sce,
  data_col    = "seurat_clusters",
  cluster_col = "orig.ident",
  units       = "frequency"
)


rnabioco/djvdj documentation built on Oct. 24, 2023, 7:33 p.m.