pairwise_alignment_sequence_identity: Calculate the percentage of pairwise sequence identity

View source: R/pairwise_alignment.R

pairwise_alignment_sequence_identityR Documentation

Calculate the percentage of pairwise sequence identity

Description

Calculate the percentage of pairwise sequence identity

Usage

pairwise_alignment_sequence_identity(
  seqs,
  aln_type = "global",
  pid_type = "PID1"
)

Arguments

seqs

A named character vector to convert into a Biostrings::AAStringSet or a Biostrings::AAStringSet with the sequences of interest. If they are not named, arbitrary names will be given.

aln_type

A character vector of one containing the alignment type. Possible options are "global" (Needleman-Wunsch),"local" (Smith-Waterman) and "overlap".

pid_type

A character vector of one containing the definition of percent sequence identity. Possible options are "PID1", "PID2", "PID3" and "PID4".

Value

A long DataFrame with the results.

Alignment types

  • global: align whole strings with end gap penalties (Needleman-Wunsch).

  • local: align string fragments (Smith-Waterman).

  • overlap: align whole strings without end gap penalties.

Percent sequence identity

  • PID1: 100 * (identical positions) / (aligned positions + internal gap positions).

  • PID2: 100 * (identical positions) / (aligned positions).

  • PID3: 100 * (identical positions) / (length shorter sequence).

  • PID4: 100 * (identical positions) / (average length of the two sequences).

Examples

data(phmmer_2abl)
pairwise_alignment_sequence_identity(
    seqs = phmmer_2abl$hits.fullfasta[6:10],
    aln_type = "overlap",
    pid_type = "PID2"
)


currocam/HMMERutils documentation built on Feb. 15, 2023, 8:41 p.m.