| airr_stats | R Documentation |
A family of functions that extract core descriptive statistics from an ImmunData object.
Supported methods are the following.
airr_stats_chains — count V(D)J chains per repertoire
(optionally split by locus). Quickly gauges capture depth per repertoire
and, when split by locus, reveals TRA/TRB/IGH balance. Use it for QC,
library-size checks, and to spot locus-specific dropouts or
over-representation.
airr_stats_lengths — count the number of sequence lengths
per repertoire. Summarizes the CDR3 length distribution, a sensitive QC
fingerprint of repertoire prep and selection. Helpful for detecting
primer/UMI biases, comparing cohorts, and deriving length-based features for
models.
airr_stats_genes - count V(D)J gene segments per repertoire,
optionally split by locus and using either receptor counts or barcode/UMI
counts as the measure. Profiles V/D/J gene usage to characterize repertoire
composition and germline biases, with optional locus split. Useful for
cohort comparisons, flagging clonal expansions, and producing ML-ready
features for repertoire-level ML tasks.
airr_stats_chains(
idata,
locus_col = NA,
autojoin = getOption("immundata.autojoin", TRUE),
format = c("long", "wide")
)
airr_stats_lengths(
idata,
seq_col = "cdr3_aa",
autojoin = getOption("immundata.autojoin", TRUE),
format = c("long", "wide")
)
airr_stats_genes(
idata,
gene_col = "v_call",
level = c("receptor", "barcode"),
by = c(NA, "locus"),
autojoin = getOption("immundata.autojoin", TRUE),
format = c("long", "wide")
)
idata |
An |
locus_col |
Column in |
autojoin |
Logical. If TRUE, join repertoire metadata by the schema repertoire id.
Change the default behaviour by calling |
format |
String. One of |
seq_col |
Character vector with names of the columns containing sequences. |
gene_col |
A single column name in |
level |
One of |
by |
Either |
airr_stats_chains Returns a tibble with columns:repertoire_id – repertoire identifier
locus – TRA, TRB, IGH, ... (present only if locus_col is supplied)
n_chains – number of chains
airr_stats_lengths Returns a tibble with columns:repertoire_id – repertoire identifier
seq_len – lengths of sequences
n – number of receptors
airr_stats_genes A tibble with columns:repertoire_id - repertoire identifier
(optional) locus - TRA, TRB, IGH, ... (present only when by = "locus"
and the locus column exists)
<gene_col> - the gene segment value (e.g., V gene)
n - the measure:
if level = "receptor": number of receptors carrying the gene segment
if level = "barcode": sum of counts across receptors for the segment
immundata::ImmunData
# Limit the number of threads used by the underlying DB for this session.
# Change this only if you know what you're doing (e.g., multi-user machines, shared CI/servers).
db_exec("SET threads TO 2")
# Load data
## Not run:
immdata <- get_test_idata() |> agg_repertoires("Therapy")
## End(Not run)
#
# airr_stats_chains
#
## Not run:
airr_stats_chains(immdata)
## End(Not run)
#
# airr_stats_lengths
#
## Not run:
airr_stats_lengths(immdata)
## End(Not run)
#
# airr_stats_genes
#
## Not run:
# V gene usage by receptor count
airr_stats_genes(immdata, gene_col = "v_call", level = "receptor")
# V gene usage by summed cell/UMI counts (if a count column is present)
airr_stats_genes(immdata, gene_col = "v_call", level = "barcode")
# Split by locus (TRA/TRB/... if locus column exists)
airr_stats_genes(immdata, gene_col = "v_call", level = "receptor", by = "locus")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.