View source: R/operations_agg_receptors.R
| agg_receptors | R Documentation |
Processes a table of immune receptor sequences (chains or clonotypes) to
identify unique receptors based on a specified schema. It assigns a unique
identifier (imd_receptor_id) to each distinct receptor signature and
returns an annotated table linking the original sequence data to these
receptor IDs.
This function is a core component used within read_repertoires() and handles
different input data structures:
Simple tables (no counts, no cell IDs).
Bulk sequencing data (using a count column).
Single-cell data (using a barcode/cell ID column). For single-cell data, it can perform chain pairing if the schema specifies multiple chains (e.g., TRA and TRB).
agg_receptors(
dataset,
schema,
barcode_col = NULL,
count_col = NULL,
locus_col = NULL,
umi_col = NULL
)
dataset |
A |
schema |
Defines how a unique receptor is identified. Can be:
|
barcode_col |
Character(1). The name of the column containing cell
identifiers (barcodes). Required for single-cell processing and chain pairing.
Default: |
count_col |
Character(1). The name of the column containing counts
(e.g., UMI counts for bulk, clonotype frequency). Used for bulk data
processing. Default: |
locus_col |
Character(1). The name of the column specifying the chain locus
(e.g., "TRA", "TRB"). Required if |
umi_col |
Character(1). The name of the column containing UMI counts.
Required for single-cell data ( |
The function performs the following main steps:
Validation: Checks inputs, schema validity, and existence of required columns.
Schema Parsing: Determines receptor features and target chains from schema.
Locus Filtering: If schema$chains is provided, filters the dataset
to include only rows matching the specified locus/loci.
Processing Logic (based on barcode_col and count_col):
Simple Table/Bulk (No Barcodes): Assigns unique internal barcode/chain IDs.
Identifies unique receptors based on schema$features. Calculates
imd_chain_count (1 for simple table, from count_col for bulk).
Single-Cell (Barcodes Provided): Uses barcode_col for imd_barcode_id.
Single Chain: (length(schema$chains) <= 1). Identifies unique
receptors based on schema$features. Uses umi_col to keep one
chain per barcode when needed. imd_chain_count is 1.
Paired Chain: (length(schema$chains) == 2). Requires locus_col
and umi_col. Filters chains within each cell/locus group based
on max umi_col. Creates paired receptors by joining the two
specified loci for each cell based on schema$features from both.
Assigns a unique imd_receptor_id to each pair.
imd_chain_count is 1 (representing the chain record).
Output: Returns an annotated data frame containing original columns plus
internal identifiers (imd_receptor_id, imd_barcode_id, imd_chain_id)
and counts (imd_chain_count).
Internal column names are typically managed by immundata:::imd_schema().
A duckplyr_df (or data frame) representing the annotated sequences.
This table links each original sequence record (chain) to a defined receptor
and includes standardized columns:
imd_receptor_id: Integer ID unique to each distinct receptor signature.
imd_barcode_id: Integer ID unique to each cell/barcode (or row if no barcode).
imd_chain_id: Integer ID unique to each input row (chain).
imd_chain_count: Integer count associated with the chain (1 for SC/simple,
from count_col for bulk).
This output is typically assigned to the $annotations field of an ImmunData object.
read_repertoires(), make_receptor_schema(), ImmunData
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.