pid: Percent Sequence Identity
In anandhupresannan/biostrings: Efficient manipulation of biological strings

Description Usage Arguments Details Value Author(s) References See Also Examples

Calculates the percent sequence identity for a pairwise sequence alignment.

1	pid(x, type="PID1")

`x`	a `PairwiseAlignments` object.
`type`	one of percent sequence identity. One of `"PID1"`, `"PID2"`, `"PID3"`, and `"PID4"`. See Details for more information.

Since there is no universal definition of percent sequence identity, the pid function calculates this statistic in the following types:

"PID1":: 100 * (identical positions) / (aligned positions + internal gap positions)
"PID2":: 100 * (identical positions) / (aligned positions)
"PID3":: 100 * (identical positions) / (length shorter sequence)
"PID4":: 100 * (identical positions) / (average length of the two sequences)

A numeric vector containing the specified sequence identity measures.

P. Aboyoun

A. May, Percent Sequence Identity: The Need to Be Explicit, Structure 2004, 12(5):737.

G. Raghava and G. Barton, Quantification of the variation in percentage identity for protein sequence alignments, BMC Bioinformatics 2006, 7:415.

pairwiseAlignment, PairwiseAlignments-class, match-utils

  s1 <- DNAString("AGTATAGATGATAGAT")
  s2 <- DNAString("AGTAGATAGATGGATGATAGATA")

  palign1 <- pairwiseAlignment(s1, s2)
  palign1
  pid(palign1)

  palign2 <-
    pairwiseAlignment(s1, s2,
      substitutionMatrix =
      nucleotideSubstitutionMatrix(match = 2, mismatch = 10, baseOnly = TRUE))
  palign2
  pid(palign2, type = "PID4")