airr_stats: Compute key immune repertoire statistics
In immunarch: Multi-Modal Immune Repertoire Analytics for Immunotherapy and Vaccine Design in R

airr_stats

R Documentation

Compute key immune repertoire statistics

Description

A family of functions that extract core descriptive statistics from an ImmunData object.

Available functions

Supported methods are the following.

airr_stats_chains — count V(D)J chains per repertoire (optionally split by locus). Quickly gauges capture depth per repertoire and, when split by locus, reveals TRA/TRB/IGH balance. Use it for QC, library-size checks, and to spot locus-specific dropouts or over-representation.

airr_stats_lengths — count the number of sequence lengths per repertoire. Summarizes the CDR3 length distribution, a sensitive QC fingerprint of repertoire prep and selection. Helpful for detecting primer/UMI biases, comparing cohorts, and deriving length-based features for models.

airr_stats_genes - count V(D)J gene segments per repertoire, optionally split by locus and using either receptor counts or barcode/UMI counts as the measure. Profiles V/D/J gene usage to characterize repertoire composition and germline biases, with optional locus split. Useful for cohort comparisons, flagging clonal expansions, and producing ML-ready features for repertoire-level ML tasks.

Usage

airr_stats_chains(
  idata,
  locus_col = NA,
  autojoin = getOption("immundata.autojoin", TRUE),
  format = c("long", "wide")
)

airr_stats_lengths(
  idata,
  seq_col = "cdr3_aa",
  autojoin = getOption("immundata.autojoin", TRUE),
  format = c("long", "wide")
)

airr_stats_genes(
  idata,
  gene_col = "v_call",
  level = c("receptor", "barcode"),
  by = c(NA, "locus"),
  autojoin = getOption("immundata.autojoin", TRUE),
  format = c("long", "wide")
)

Arguments

`idata`	An `ImmunData` object.
`locus_col`	Column in `idata$annotations` that stores the locus (e.g. `"locus"`). If `NULL` or missing, the result is not split by locus.
`autojoin`	Logical. If TRUE, join repertoire metadata by the schema repertoire id. Change the default behaviour by calling `options(immunarch.autojoin = FALSE)`.
`format`	String. One of `"long"` ("long" tibble with `imd_repertoire_id`, facet columns, and `value`; useful for visualizations) or `"wide"` (wide/unmelted table of features, with each row corresponding to a specific repertoire / pair of repertoires; useful for Machine Learning).
`seq_col`	Character vector with names of the columns containing sequences.
`gene_col`	A single column name in `idata$annotations` with gene segment calls (e.g., `"v_call"`, `"d_call"`, `"j_call"`, `"c_call"`). Default is `"v_call"`.
`level`	One of `"receptor"` or `"barcode"`. If `"receptor"` (default), the function counts unique receptors (one per receptor ID) that carry a given gene segment. If `"barcode"`, the function sums counts (e.g., cells/UMIs) per gene segment using the column defined by `immundata::imd_schema("count")`.
`by`	Either `NULL` (no split) or `"locus"`. When `"locus"`, the result is further split by the locus column if present (as given by `immundata::imd_schema("locus")`); otherwise a warning is emitted and the split is ignored.

Value

`airr_stats_chains` Returns a tibble with columns:

repertoire_id – repertoire identifier
locus – TRA, TRB, IGH, ... (present only if locus_col is supplied)
n_chains – number of chains

`airr_stats_lengths` Returns a tibble with columns:

repertoire_id – repertoire identifier
seq_len – lengths of sequences
n – number of receptors

`airr_stats_genes` A tibble with columns:

repertoire_id - repertoire identifier
(optional) locus - TRA, TRB, IGH, ... (present only when by = "locus" and the locus column exists)
⁠<gene_col>⁠ - the gene segment value (e.g., V gene)
n - the measure:
- if level = "receptor": number of receptors carrying the gene segment
- if level = "barcode": sum of counts across receptors for the segment

Examples

# Limit the number of threads used by the underlying DB for this session.
# Change this only if you know what you're doing (e.g., multi-user machines, shared CI/servers).
db_exec("SET threads TO 2")

# Load data
## Not run: 
immdata <- get_test_idata() |> agg_repertoires("Therapy")

## End(Not run)

#
# airr_stats_chains
#

## Not run: 
airr_stats_chains(immdata)

## End(Not run)

#
# airr_stats_lengths
#

## Not run: 
airr_stats_lengths(immdata)

## End(Not run)

#
# airr_stats_genes
#

## Not run: 
# V gene usage by receptor count
airr_stats_genes(immdata, gene_col = "v_call", level = "receptor")

# V gene usage by summed cell/UMI counts (if a count column is present)
airr_stats_genes(immdata, gene_col = "v_call", level = "barcode")

# Split by locus (TRA/TRB/... if locus column exists)
airr_stats_genes(immdata, gene_col = "v_call", level = "receptor", by = "locus")

## End(Not run)

immunarch documentation built on Nov. 5, 2025, 7:21 p.m.

immunarch index

README.md Immunarch overview Limit the number of threads used by immunarch

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

immunarch
Multi-Modal Immune Repertoire Analytics for Immunotherapy and Vaccine Design in R

airr_stats: Compute key immune repertoire statistics
In immunarch: Multi-Modal Immune Repertoire Analytics for Immunotherapy and Vaccine Design in R

Compute key immune repertoire statistics

Description

Available functions

Usage

Arguments

Value

`airr_stats_chains` Returns a tibble with columns:

`airr_stats_lengths` Returns a tibble with columns:

`airr_stats_genes` A tibble with columns:

See Also

Examples

Related to airr_stats in immunarch...

R Package Documentation

Browse R Packages

We want your feedback!

immunarch Multi-Modal Immune Repertoire Analytics for Immunotherapy and Vaccine Design in R

airr_stats: Compute key immune repertoire statistics In immunarch: Multi-Modal Immune Repertoire Analytics for Immunotherapy and Vaccine Design in R

Compute key immune repertoire statistics

Description

Available functions

Usage

Arguments

Value

airr_stats_chains Returns a tibble with columns:

airr_stats_lengths Returns a tibble with columns:

airr_stats_genes A tibble with columns:

See Also

Examples

Related to airr_stats in immunarch...

R Package Documentation

Browse R Packages

We want your feedback!

immunarch
Multi-Modal Immune Repertoire Analytics for Immunotherapy and Vaccine Design in R

airr_stats: Compute key immune repertoire statistics
In immunarch: Multi-Modal Immune Repertoire Analytics for Immunotherapy and Vaccine Design in R

`airr_stats_chains` Returns a tibble with columns:

`airr_stats_lengths` Returns a tibble with columns:

`airr_stats_genes` A tibble with columns: