summarize_vdj: Summarize V(D)J data for each cell

View source: R/mutate-vdj.R

summarize_vdjR Documentation

Summarize V(D)J data for each cell

Description

Summarize per-chain values for each cell using a function or purrr-style lambda. This is useful for plotting or filtering cells based on the V(D)J meta.data.

Usage

summarize_vdj(
  input,
  data_cols,
  fn = NULL,
  ...,
  chain = NULL,
  chain_col = global$chain_col,
  col_names = "{.col}",
  return_df = FALSE,
  sep = global$sep
)

Arguments

input

Single cell object or data.frame containing V(D)J data. If a data.frame is provided, the cell barcodes should be stored as row names.

data_cols

meta.data column(s) containing V(D)J data to summarize for each cell

fn

Function to apply to each selected column, possible values can be either a function, e.g. mean, or a purrr-style lambda, e.g. ~ mean(.x, na.rm = TRUE). If NULL, the mean will be calculated for numeric values, non-numeric columns will be combined into a single string.

...

Additional arguments to pass to fn

chain

Chain to use for summarizing V(D)J data

chain_col

meta.data column(s) containing chains for each cell

col_names

A glue specification that describes how to name the output columns, use {.col} to refer to the original column name. If col_names is NULL, the original column names will be used.

return_df

Return results as a data.frame. If FALSE, results will be added to the input object.

sep

Separator used for storing per cell V(D)J data

Value

Object containing V(D)J data summarized for each cell

Examples

# Summarize numeric columns
# by default the mean will be calculated for numeric columns
res <- summarize_vdj(
  vdj_sce,
  data_cols = c("all_del", "all_ins")
)

head(slot(res, "colData"), 3)

# Specifying a different summary function
# this calculates the median number of insertions and deletions for each
# cell
res <- summarize_vdj(
  vdj_sce,
  data_cols = c("all_del", "all_ins"),
  fn = stats::median
)

head(slot(res, "colData"), 3)

# Summarize values for a specific chain
res <- summarize_vdj(
  vdj_sce,
  data_cols = c("all_del", "all_ins"),
  chain = "IGK"
)

head(slot(res, "colData"), 3)

# Specifying new names for summarized columns
# use {.col} to refer to the original column name
res <- summarize_vdj(
  vdj_sce,
  data_cols = c("all_del", "all_ins"),
  fn = stats::median,
  col_names = "median_{.col}"
)

head(slot(res, "colData"), 1)

# Return a data.frame instead of adding the results to the input object
res <- summarize_vdj(
  vdj_sce,
  data_cols = c("all_del", "all_ins"),
  return_df = TRUE
)

head(res, 1)

# Using a lambda function to summarize values
# use '.x' to refer to values in the column
# this creates a new column showing the unique chains for each cell
res <- summarize_vdj(
  vdj_sce,
  data_cols = "chains",
  fn = ~ paste0(unique(.x), collapse = "_"),
  col_names = "unique_chains"
)

head(slot(res, "colData"), 3)

# Creating an index column to use for filtering/plotting
# this creates a column indicating which cells have no insertions
# the V(D)J data can be filtered based on this new column
res <- summarize_vdj(
  vdj_sce,
  data_cols = "all_ins",
  fn = ~ all(.x == 0),
  col_names = "no_insertions"
)

res <- filter_vdj(
  res,
  filt = no_insertions
)

head(slot(res, "colData"), 3)


rnabioco/djvdj documentation built on Oct. 24, 2023, 7:33 p.m.