View source: R/prepare_join_df.R
prepare_join_df | R Documentation |
Merges previously extracted epitopes and proteins, and performs some sanity checks and filtering.
prepare_join_df(
epitopes,
proteins,
save_folder = NULL,
min_epit = 8,
max_epit = 25,
only_exact = FALSE,
pos.mismatch.rm = c("all", "align"),
set.positive = c("any", "mode", "all")
)
epitopes |
data frame of epitope data (returned by |
proteins |
data frame of protein data (returned by |
save_folder |
path to folder for saving the results. |
min_epit |
positive integer, shortest epitope to be considered |
max_epit |
positive integer, longest epitope to be considered |
only_exact |
logical, should only sequences labelled as "Exact Epitope"
in variable epit_struc_def (within |
pos.mismatch.rm |
should epitopes with position mismatches be removed? Use "all" (default) for removing any position mismatch or "align" if the routine should attempt to search the epitope sequence in the protein sequence. |
set.positive |
how to decide whether an observation should be of the "Positive" (+1) class? Use "any" to set a sequence as positive if $n_positive > 0$, "mode" to set it if $n_positive >= n_negative$, or "all" to set it if $n_negative == 0$. Defaults to "mode". |
Entries in the epitopes
input are removed if they:
Lack a valid protein ID (i.e., one that has a corresponding entry in
proteins
)
Lack a valid string in epit_seq
Lack a valid definition in epit_struc_def
Have sequences shorter than min_epit
or longer than max_epit
.
Have a mismatch between the sequence in epit_seq and the
corresponding sequence between start_pos and end_pos on the protein
sequence (see description of parameter pos.mismatch.rm
).
A data.table object containing the merged data frame
Felipe Campelo (f.campelo@aston.ac.uk)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.