score_epitope: Computes the similarities between the epitope and the...
In philliplab/EpitopeMatcher: Epitope Matcher

Computes the similarities between the epitope and the sequences in the alignment

score_epitope(
  the_scoring_job,
  query_alignment,
  range_expansion = 0,
  substitutionMatrix = "BLOSUM50"
)

`the_scoring_job`	A scoring job as a object of type 'Scoring_Job'
`query_alignment`	The query alignment
`range_expansion`	After the epitope is found in the reference seqeuence, search in each of the query sequences for the same epitope, but expand the range with this number of amino acids
`substitutionMatrix`	substitution matrix representing the fixed substitution scores for an alignment. It cannot be used in conjunction with ‘patternQuality’ and ‘subjectQuality’ arguments.

The output from this function is a list with two data.frames. The first is the results data.frame that contains these columns:

sequence_id - The sequence description from the FASTA file
score - The similarity score produced by the alignment
score_type - The type of similarity score as returned by pairwiseAlignment
eregion_in_refseq - The region of the reference sequence that was attempted to be aligned to the query sequence as returned by pairwiseAlignment
candidate_substr - The candidate substring that was obtained by expanding the coordinates found in the reference sequence by 'range_expansion' AAs on each side (unless at the end or beginning of the sequence)
matched_substr - The part of the candidate substring that was matched to the epitope as returned by pairwiseAlignment
comparison - A comparison between the epitope and the query sequence indicating where there were mismatches
pid - The percentage of amino acids that were identical (Percentage IDentity) between the epitope and query sequences
simple_distance - 100 - PID
nmatch - The number of matches in the alignment
nmismatch - The number of mismatches in the alignment
leven.dist - The Levenshtein distance (or edit distance) between the two sequences
start_pos_in_ref - The starting position in the reference sequence of the matching substring that was found for the epitope
end_pos_in_ref - The end position in the reference sequence of the matching substring that was found for the epitope
start_pos_in_candidate - The starting position in the candidate subsequence of the query sequence that was obtained by expanding the range of the reference that matches the epitope by starting a number of amino acids earlier in the query sequence. The number of amino acids is controlled by the range_extention parameter.
end_pos_in_candidate - The end position in the candidate subsequence of the query sequence that was obtained by expanding the range of the reference that matches the epitope by stopping a number of amino acids later in the query sequence. The number of amino acids is controlled by the range_extention parameter.
range_expansion - The number of amino acids by which the range of the query sequence that is compared to the epitope is larger than then match found for the epitope in the reference sequence.
These three column are usually added to the table by the score_sequence_epitopes function:
- epitope - The epitope from the lanl file that was searched for in the reference sequence
- hla_genotype - The name of the hla genotype the epitope is associated with
- lanl_start_pos - The start position of the epitope according to the lanl file
- lanl_end_pos - The end position of the epitope according to the lanl file

The second element of the list is the error log data.frame that contains these columns:

pattern - The epitope as aligned to the reference sequence when a less restrictive alignment algorithm is used than the one that failed when aligning to the reference sequence the first time
subject - The portion of the reference sequence to which the epitope was aligned to when a less restrictive alignment algorithm is used than the one that failed when aligning to the reference sequence the first time
global_alignment_start - The starting position in the reference sequence of the subsequence of the reference sequence that the epitope was aligned to when a less restrictive alignment algorithm is used than the one that failed when aligning to the reference sequence the first time
global_alignment_end - The end position in the reference sequence of the subsequence of the reference sequence that the epitope was aligned to when a less restrictive alignment algorithm is used than the one that failed when aligning to the reference sequence the first time
These three column are usually added to the table by the score_sequence_epitopes function:
- epitope - The epitope from the lanl file that was searched for in the reference sequence
- hla_genotype - The name of the hla genotype the epitope is associated with
- lanl_start_pos - The start position of the epitope according to the lanl file
- lanl_end_pos - The end position of the epitope according to the lanl file