stringDist: String Distance/Alignment Score Matrix
In Biostrings: Efficient manipulation of biological strings

Description Usage Arguments Details Value Author(s) See Also Examples

Computes the Levenshtein edit distance or pairwise alignment score matrix for a set of strings.

stringDist(x, method = "levenshtein", ignoreCase = FALSE, diag = FALSE, upper = FALSE, ...)
## S4 method for signature 'XStringSet'
stringDist(x, method = "levenshtein", ignoreCase = FALSE, diag = FALSE,
                   upper = FALSE, type = "global", quality = PhredQuality(22L),
                   substitutionMatrix = NULL, fuzzyMatrix = NULL, gapOpening = 0,
                   gapExtension = 1)
## S4 method for signature 'QualityScaledXStringSet'
stringDist(x, method = "quality", ignoreCase = FALSE,
                   diag = FALSE, upper = FALSE, type = "global", substitutionMatrix = NULL,
                   fuzzyMatrix = NULL, gapOpening = 0, gapExtension = 1)

`x`	a character vector or an `XStringSet` object.
`method`	calculation method. One of `"levenshtein"`, `"hamming"`, `"quality"`, or `"substitutionMatrix"`.
`ignoreCase`	logical value indicating whether to ignore case during scoring.
`diag`	logical value indicating whether the diagonal of the matrix should be printed by `print.dist`.
`upper`	logical value indicating whether the upper triangle of the matrix should be printed by `print.dist`.
`type`	(applicable when `method = "quality"` or `method = "substitutionMatrix"`). type of alignment. One of `"global"`, `"local"`, and `"overlap"`, where `"global"` = align whole strings with end gap penalties, `"local"` = align string fragments, `"overlap"` = align whole strings without end gap penalties.
`quality`	(applicable when `method = "quality"`). object of class `XStringQuality` representing the quality scores for `x` that are used in a quality-based method for generating a substitution matrix.
`substitutionMatrix`	(applicable when `method = "substitutionMatrix"`). symmetric matrix representing the fixed substitution scores in the alignment.
`fuzzyMatrix`	(applicable when `method = "quality"`). fuzzy match matrix for quality-based alignments. It takes values between 0 and 1; where 0 is an unambiguous mismatch, 1 is an unambiguous match, and values in between represent a fraction of "matchiness".
`gapOpening`	(applicable when `method = "quality"` or `method = "substitutionMatrix"`). penalty for opening a gap in the alignment.
`gapExtension`	(applicable when `method = "quality"` or `method = "substitutionMatrix"`). penalty for extending a gap in the alignment
`...`	optional arguments to generic function to support additional methods.

When method = "hamming", uses the underlying neditStartingAt code to calculate the distances, where the Hamming distance is defined as the number of substitutions between two strings of equal length. Otherwise, uses the underlying pairwiseAlignment code to compute the distance/alignment score matrix.

Returns an object of class "dist".

P. Aboyoun

dist, agrep, pairwiseAlignment, substitution.matrices

  stringDist(c("lazy", "HaZy", "crAzY"))
  stringDist(c("lazy", "HaZy", "crAzY"), ignoreCase = TRUE)

  data(phiX174Phage)
  plot(hclust(stringDist(phiX174Phage), method = "single"))

  data(srPhiX174)
  stringDist(srPhiX174[1:4])
  stringDist(srPhiX174[1:4], method = "quality",
             quality = SolexaQuality(quPhiX174[1:4]),
             gapOpening = 10, gapExtension = 4)

Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':

    strsplit

  1 2
2 2  
3 4 5
  1 2
2 1  
3 2 2
   1  2  3
2  8      
3 14 21   
4 22 21 18
            1           2           3
2    9.719319                        
3  -20.266722  -60.226292            
4 -129.261856 -137.923645 -116.691521