prepare_join_df: Merge and filter epitope and protein data

View source: R/prepare_join_df.R

prepare_join_dfR Documentation

Merge and filter epitope and protein data

Description

Merges previously extracted epitopes and proteins, and performs some sanity checks and filtering.

Usage

prepare_join_df(
  epitopes,
  proteins,
  save_folder = NULL,
  min_epit = 8,
  max_epit = 25,
  only_exact = FALSE,
  pos.mismatch.rm = c("all", "align"),
  set.positive = c("any", "mode", "all")
)

Arguments

epitopes

data frame of epitope data (returned by get_LBCE()).

proteins

data frame of protein data (returned by get_proteins()).

save_folder

path to folder for saving the results.

min_epit

positive integer, shortest epitope to be considered

max_epit

positive integer, longest epitope to be considered

only_exact

logical, should only sequences labelled as "Exact Epitope" in variable epit_struc_def (within epitopes) be considered?

pos.mismatch.rm

should epitopes with position mismatches be removed? Use "all" (default) for removing any position mismatch or "align" if the routine should attempt to search the epitope sequence in the protein sequence.

set.positive

how to decide whether an observation should be of the "Positive" (+1) class? Use "any" to set a sequence as positive if $n_positive > 0$, "mode" to set it if $n_positive >= n_negative$, or "all" to set it if $n_negative == 0$. Defaults to "mode".

Details

Entries in the epitopes input are removed if they:

  • Lack a valid protein ID (i.e., one that has a corresponding entry in proteins)

  • Lack a valid string in epit_seq

  • Lack a valid definition in epit_struc_def

  • Have sequences shorter than min_epit or longer than max_epit.

  • Have a mismatch between the sequence in epit_seq and the corresponding sequence between start_pos and end_pos on the protein sequence (see description of parameter pos.mismatch.rm).

Value

A data.table object containing the merged data frame

Author(s)

Felipe Campelo (f.campelo@aston.ac.uk)


fcampelo/epitopes documentation built on April 22, 2023, 12:23 a.m.