View source: R/operations_filter.R
| filter_immundata | R Documentation |
Provides flexible filtering options for an ImmunData object.
filter() is the main function, allowing filtering based on receptor features
(e.g., CDR3 sequence) using various matching methods (exact, regex, fuzzy) and/or
standard dplyr-style filtering on annotation columns.
filter_barcodes() is a convenience function to filter by specific cell barcodes.
filter_receptors() is a convenience function to filter by specific receptor identifiers.
filter_immundata(idata, ..., seq_options = NULL, keep_repertoires = TRUE)
## S3 method for class 'ImmunData'
filter(
.data,
...,
.by = NULL,
.preserve = FALSE,
seq_options = NULL,
keep_repertoires = TRUE
)
filter_barcodes(idata, barcodes, keep_repertoires = TRUE)
filter_receptors(idata, receptors, keep_repertoires = TRUE)
idata, .data |
An |
... |
For |
seq_options |
For
|
keep_repertoires |
Logical scalar. If |
.by |
Not used. |
.preserve |
Not used. |
barcodes |
For |
receptors |
For |
For filter:
User-provided dplyr-style filters (...) are applied before any sequence-based
filtering defined in seq_options.
Sequence filtering compares values in the query_col of the annotations table
against the provided patterns.
Supported sequence matching methods are:
"exact": Keeps rows where query_col exactly matches any of the patterns.
"regex": Keeps rows where query_col matches any of the regular expressions
in patterns.
"lev" (Levenshtein distance): Keeps rows where the edit distance between
query_col and any pattern is less than or equal to max_dist.
"hamm" (Hamming distance): Keeps rows where the Hamming distance (for
equal length strings) between query_col and any pattern is less than
or equal to max_dist.
The filtering operations act on the $annotations table. A new ImmunData
object is created containing only the rows (and corresponding receptors)
that pass the filter(s).
If keep_repertoires = TRUE (and repertoire data exists in the input),
the repertoire-level summaries ($repertoires table) are recalculated based
on the filtered annotations. Otherwise, the $repertoires table in the
output will be NULL.
For filter_barcodes and filter_receptors:
These functions provide a simpler interface for common filtering tasks based on
cell barcodes or receptor IDs, respectively. They use efficient semi_join
operations internally.
A new ImmunData object containing only the filtered annotations
(and potentially recalculated repertoire summaries). The schema remains the same.
make_seq_options(), dplyr::filter(), agg_repertoires(), ImmunData
# Basic setup (assuming idata_test is a valid ImmunData object)
# print(idata_test)
# --- filter examples ---
## Not run:
# Example 1: dplyr-style filtering on annotations
filtered_heavy <- filter(idata_test, chain == "IGH")
print(filtered_heavy)
# Example 2: Exact sequence matching on CDR3 amino acid sequence
cdr3_patterns <- c("CARGLGLVFYGMDVW", "CARDNRGAVAGVFGEAFYW")
seq_opts_exact <- make_seq_options(query_col = "CDR3_aa", patterns = cdr3_patterns)
filtered_exact_cdr3 <- filter(idata_test, seq_options = seq_opts_exact)
print(filtered_exact_cdr3)
# Example 3: Combining dplyr-style and fuzzy sequence matching (Levenshtein)
seq_opts_lev <- make_seq_options(
query_col = "CDR3_aa",
patterns = "CARGLGLVFYGMDVW",
method = "lev",
max_dist = 1
)
filtered_combined <- filter(idata_test,
chain == "IGH",
C_gene == "IGHG1",
seq_options = seq_opts_lev
)
print(filtered_combined)
# Example 4: Regex matching on V gene
v_gene_pattern <- "^IGHV[13]-" # Keep only IGHV1 or IGHV3 families
seq_opts_regex <- make_seq_options(
query_col = "V_gene",
patterns = v_gene_pattern,
method = "regex"
)
filtered_regex_v <- filter(idata_test, seq_options = seq_opts_regex)
print(filtered_regex_v)
# Example 5: Filtering without recalculating repertoires
filtered_no_rep <- filter(idata_test, chain == "IGK", keep_repertoires = FALSE)
print(filtered_no_rep) # $repertoires should be NULL
## End(Not run)
# --- filter_barcodes example ---
## Not run:
# Assuming 'cell1_barcode' and 'cell5_barcode' exist in idata_test$annotations$cell_id
specific_barcodes <- c("cell1_barcode", "cell5_barcode")
filtered_cells <- filter_barcodes(idata_test, barcodes = specific_barcodes)
print(filtered_cells)
## End(Not run)
# --- filter_receptors example ---
## Not run:
# Assuming receptor IDs 101 and 205 exist in idata_test$annotations$receptor_id
specific_receptors <- c(101, 205) # Or character IDs if applicable
filtered_recs <- filter_receptors(idata_test, receptors = specific_receptors)
print(filtered_recs)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.