pid: Percent Sequence Identity

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Calculates the percent sequence identity for a pairwise sequence alignment.

Usage

1
pid(x, type="PID1")

Arguments

x

a PairwiseAlignments object.

type

one of percent sequence identity. One of "PID1", "PID2", "PID3", and "PID4". See Details for more information.

Details

Since there is no universal definition of percent sequence identity, the pid function calculates this statistic in the following types:

"PID1":

100 * (identical positions) / (aligned positions + internal gap positions)

"PID2":

100 * (identical positions) / (aligned positions)

"PID3":

100 * (identical positions) / (length shorter sequence)

"PID4":

100 * (identical positions) / (average length of the two sequences)

Value

A numeric vector containing the specified sequence identity measures.

Author(s)

P. Aboyoun

References

A. May, Percent Sequence Identity: The Need to Be Explicit, Structure 2004, 12(5):737.

G. Raghava and G. Barton, Quantification of the variation in percentage identity for protein sequence alignments, BMC Bioinformatics 2006, 7:415.

See Also

pairwiseAlignment, PairwiseAlignments-class, match-utils

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
  s1 <- DNAString("AGTATAGATGATAGAT")
  s2 <- DNAString("AGTAGATAGATGGATGATAGATA")

  palign1 <- pairwiseAlignment(s1, s2)
  palign1
  pid(palign1)

  palign2 <-
    pairwiseAlignment(s1, s2,
      substitutionMatrix =
      nucleotideSubstitutionMatrix(match = 2, mismatch = 10, baseOnly = TRUE))
  palign2
  pid(palign2, type = "PID4")

Example output

Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] AGTATA------GATGATAGAT 
subject: [1] AGTAGATAGATGGATGATAGAT 
score: -24.17294 
[1] 68.18182
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] AGTA--TAGATG----ATAGAT 
subject: [1] AGTAGATAGATGGATGATAGAT 
score: -26 
[1] 82.05128

Biostrings documentation built on Nov. 8, 2020, 11:12 p.m.