stringDist: String Distance/Alignment Score Matrix

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Computes the Levenshtein edit distance or pairwise alignment score matrix for a set of strings.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
stringDist(x, method = "levenshtein", ignoreCase = FALSE, diag = FALSE, upper = FALSE, ...)
## S4 method for signature 'XStringSet'
stringDist(x, method = "levenshtein", ignoreCase = FALSE, diag = FALSE,
                   upper = FALSE, type = "global", quality = PhredQuality(22L),
                   substitutionMatrix = NULL, fuzzyMatrix = NULL, gapOpening = 0,
                   gapExtension = 1)
## S4 method for signature 'QualityScaledXStringSet'
stringDist(x, method = "quality", ignoreCase = FALSE,
                   diag = FALSE, upper = FALSE, type = "global", substitutionMatrix = NULL,
                   fuzzyMatrix = NULL, gapOpening = 0, gapExtension = 1)

Arguments

x

a character vector or an XStringSet object.

method

calculation method. One of "levenshtein", "hamming", "quality", or "substitutionMatrix".

ignoreCase

logical value indicating whether to ignore case during scoring.

diag

logical value indicating whether the diagonal of the matrix should be printed by print.dist.

upper

logical value indicating whether the upper triangle of the matrix should be printed by print.dist.

type

(applicable when method = "quality" or method = "substitutionMatrix"). type of alignment. One of "global", "local", and "overlap", where "global" = align whole strings with end gap penalties, "local" = align string fragments, "overlap" = align whole strings without end gap penalties.

quality

(applicable when method = "quality"). object of class XStringQuality representing the quality scores for x that are used in a quality-based method for generating a substitution matrix.

substitutionMatrix

(applicable when method = "substitutionMatrix"). symmetric matrix representing the fixed substitution scores in the alignment.

fuzzyMatrix

(applicable when method = "quality"). fuzzy match matrix for quality-based alignments. It takes values between 0 and 1; where 0 is an unambiguous mismatch, 1 is an unambiguous match, and values in between represent a fraction of "matchiness".

gapOpening

(applicable when method = "quality" or method = "substitutionMatrix"). penalty for opening a gap in the alignment.

gapExtension

(applicable when method = "quality" or method = "substitutionMatrix"). penalty for extending a gap in the alignment

...

optional arguments to generic function to support additional methods.

Details

When method = "hamming", uses the underlying neditStartingAt code to calculate the distances, where the Hamming distance is defined as the number of substitutions between two strings of equal length. Otherwise, uses the underlying pairwiseAlignment code to compute the distance/alignment score matrix.

Value

Returns an object of class "dist".

Author(s)

P. Aboyoun

See Also

dist, agrep, pairwiseAlignment, substitution.matrices

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
  stringDist(c("lazy", "HaZy", "crAzY"))
  stringDist(c("lazy", "HaZy", "crAzY"), ignoreCase = TRUE)

  data(phiX174Phage)
  plot(hclust(stringDist(phiX174Phage), method = "single"))

  data(srPhiX174)
  stringDist(srPhiX174[1:4])
  stringDist(srPhiX174[1:4], method = "quality",
             quality = SolexaQuality(quPhiX174[1:4]),
             gapOpening = 10, gapExtension = 4)


Search within the Biostrings package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.