stringDist: Function to compute distances between strings

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

The function can be used to compute distances between strings.

Usage

1
stringDist(x, y, method = "levenshtein", mismatch = 1, gap = 1)

Arguments

x

character vector, first string

y

character vector, second string

method

character, name of the distance method. This must be "levenshtein" or "hamming". Default is the classical Levenshtein distance.

mismatch

numeric, distance value for a mismatch between symbols

gap

numeric, distance value for inserting a gap

Details

The function computes the Hamming and the Levenshtein (edit) distance of two given strings (sequences).

In case of the Hamming distance the two strings must have the same length.

In case of the Levenshtein (edit) distance a scoring and a trace-back matrix are computed and are saved as attributes "ScoringMatrix" and "TraceBackMatrix". The characters in the trace-back matrix reflect insertion of a gap in string y (d: deletion), match (m), mismatch (mm), and insertion of a gap in string x (i).

Value

stringDist returns an object of S3 class "stringDist" inherited from class "dist"; cf. dist.

Note

The function is mainly for teaching purposes.

For distances between strings and string alignments see also Bioconductor package Biostrings.

Author(s)

Matthias Kohl Matthias.Kohl@stamats.de

References

R. Merkl and S. Waack (2009). Bioinformatik Interaktiv. Wiley.

See Also

dist, stringSim

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
x <- "GACGGATTATG"
y <- "GATCGGAATAG"
## Levenshtein distance
d <- stringDist(x, y)
d
attr(d, "ScoringMatrix")
attr(d, "TraceBackMatrix")

## Hamming distance
stringDist(x, y)

Example output

            GACGGATTATG
GATCGGAATAG           3
    gap  G A T C G G A A T  A  G
gap   0  1 2 3 4 5 6 7 8 9 10 11
G     1  0 1 2 3 4 5 6 7 8  9 10
A     2  1 0 1 2 3 4 5 6 7  8  9
C     3  2 1 1 1 2 3 4 5 6  7  8
G     4  3 2 2 2 1 2 3 4 5  6  7
G     5  4 3 3 3 2 1 2 3 4  5  6
A     6  5 4 4 4 3 2 1 2 3  4  5
T     7  6 5 4 5 4 3 2 2 2  3  4
T     8  7 6 5 5 5 4 3 3 2  3  4
A     9  8 7 6 6 6 5 4 3 3  2  3
T    10  9 8 7 7 7 6 5 4 3  3  3
G    11 10 9 8 8 7 7 6 5 4  4  3
    gap     G     A     T      C        G      G     A     A      T   A     
gap "start" "i"   "i"   "i"    "i"      "i"    "i"   "i"   "i"    "i" "i"   
G   "d"     "m"   "i"   "i"    "i"      "m/i"  "m/i" "i"   "i"    "i" "i"   
A   "d"     "d"   "m"   "i"    "i"      "i"    "i"   "m/i" "m/i"  "i" "m/i" 
C   "d"     "d"   "d"   "mm"   "m"      "i"    "i"   "i"   "i"    "i" "i"   
G   "d"     "d/m" "d"   "d/mm" "d/mm"   "m"    "m/i" "i"   "i"    "i" "i"   
G   "d"     "d/m" "d"   "d/mm" "d/mm"   "d/m"  "m"   "i"   "i"    "i" "i"   
A   "d"     "d"   "d/m" "d/mm" "d/mm"   "d"    "d"   "m"   "m/i"  "i" "m/i" 
T   "d"     "d"   "d"   "m"    "d/mm/i" "d"    "d"   "d"   "mm"   "m" "i"   
T   "d"     "d"   "d"   "d/m"  "mm"     "d"    "d"   "d"   "d/mm" "m" "mm/i"
A   "d"     "d"   "d/m" "d"    "d/mm"   "d/mm" "d"   "d/m" "m"    "d" "m"   
T   "d"     "d"   "d"   "d/m"  "d/mm"   "d/mm" "d"   "d"   "d"    "m" "d"   
G   "d"     "d/m" "d"   "d"    "d/mm"   "m"    "d/m" "d"   "d"    "d" "d/mm"
    G     
gap "i"   
G   "m/i" 
A   "i"   
C   "i"   
G   "m/i" 
G   "m/i" 
A   "i"   
T   "i"   
T   "mm/i"
A   "i"   
T   "mm"  
G   "m"   
            GACGGATTATG
GATCGGAATAG           3

MKmisc documentation built on Aug. 8, 2021, 5:06 p.m.