filterInputData | R Documentation |
Given a data frame with a column containing receptor sequences, filter data rows by sequence length and sequence content. Keep all data columns or choose which columns to keep.
filterInputData(
data,
seq_col,
min_seq_length = NULL,
drop_matches = NULL,
subset_cols = NULL,
count_col = NULL,
verbose = FALSE
)
data |
A data frame. |
seq_col |
Specifies the column(s) of |
min_seq_length |
Observations whose receptor sequences have fewer than |
drop_matches |
Accepts a character string containing a regular expression
(see |
subset_cols |
Specifies which columns of the AIRR-Seq data are included in the output.
Accepts a character vector of column names
or a numeric vector of column indices.
The default
|
count_col |
Optional. Specifies the column of |
verbose |
Logical. If |
A data frame.
Brian Neal (Brian.Neal@ucsf.edu)
Hai Yang, Jason Cham, Brian Neal, Zenghua Fan, Tao He and Li Zhang. (2023). NAIR: Network Analysis of Immune Repertoire. Frontiers in Immunology, vol. 14. doi: 10.3389/fimmu.2023.1181825
set.seed(42)
raw_data <- simulateToyData()
# Remove sequences shorter than 13 characters,
# as well as sequences containing the subsequence "GGGG".
# Keep variables for clone sequence, clone frequency and sample ID
filterInputData(
raw_data,
seq_col = "CloneSeq",
min_seq_length = 13,
drop_matches = "GGGG",
subset_cols =
c("CloneSeq", "CloneFrequency", "SampleID"),
verbose = TRUE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.