View source: R/operations_annotate.R
| annotate_immundata | R Documentation |
Joins additional annotation data to the annotations slot of an ImmunData object.
This function allows you to add extra information to your repertoire data by joining a dataframe of annotations based on specified columns. It supports joining by one or more columns.
annotate_immundata(
idata,
annotations,
by,
keep_repertoires = TRUE,
remove_limit = FALSE
)
annotate(idata, annotations, by, keep_repertoires = TRUE, remove_limit = FALSE)
annotate_receptors(
idata,
annotations,
annot_col = imd_schema("receptor"),
keep_repertoires = TRUE,
remove_limit = FALSE
)
annotate_barcodes(
idata,
annotations,
annot_col = "<rownames>",
keep_repertoires = TRUE,
remove_limit = FALSE
)
annotate_chains(
idata,
annotations,
annot_col = imd_schema("chain"),
keep_repertoires = TRUE,
remove_limit = FALSE
)
idata |
An |
annotations |
A data frame containing the annotations to be joined. |
by |
A named character vector specifying the columns to join by. The names of the
vector should be the column names in |
keep_repertoires |
Logical. If |
remove_limit |
Logical. If |
annot_col |
A character vector specifying the column with receptor, barcode or chain identifiers
to annotate a corresponding receptors, barode or chains in |
The function performs a left join operation, keeping all rows from
idata$annotations and adding matching columns from the annotations data frame.
If there are multiple matches in annotations for a row in idata$annotations,
all combinations will be returned, potentially increasing the number of rows
in the resulting annotations table.
The function uses checkmate to validate the input types and structure.
A check is performed to ensure that the columns specified in by exist in both
idata$annotations and the annotations data frame.
The annotations data frame is converted to a duckdb tibble internally for
efficient joining, especially with large datasets.
A new ImmunData object with the annotations joined to the annotations slot.
By default (remove_limit = FALSE), joining an annotations data frame with 100 or
more columns will trigger a warning. This is a safeguard to prevent accidental
joining of very wide data (e.g., gene expression data) that could lead to
performance degradation or crashes. If you understand the risks and intend to join
a wide data frame, set remove_limit = TRUE.
## Not run:
# Assuming 'my_immun_data' is an ImmunData object and 'sample_info' is a data frame
# with a column 'sample_id' matching 'sample' in my_immun_data$annotations
# and additional columns like 'treatment' and 'disease_status'.
sample_info <- data.frame(
sample_id = c("sample1", "sample2", "sample3", "sample4"),
treatment = c("Treatment A", "Treatment B", "Treatment A", "Treatment C"),
disease_status = c("Healthy", "Disease", "Healthy", "Disease"),
stringsAsFactors = FALSE # Important to keep characters as characters
)
# Join sample information using the 'sample' column
my_immun_data_annotated <- annotate(
idata = my_immun_data,
annotations = sample_info,
by = c("sample" = "sample_id")
)
# New sample_info
# Join data by multiple columns, e.g., 'sample' and 'barcode'
# Assuming 'cell_annotations' is a data frame with 'sample_barcode' and 'cell_type'
my_immun_data_cell_annotated <- annotate(
idata = my_immun_data,
annotations = cell_annotations,
by = c("sample" = "sample", "barcode" = "sample_barcode")
)
# Join a wide dataframe, suppressing the column limit warning
# Assuming 'gene_expression' is a data frame with 'barcode' and many gene columns
my_immun_data_gene_expression <- annotate(
idata = my_immun_data,
annotations = gene_expression,
by = c("barcode" = "barcode"),
remove_limit = TRUE
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.