SequenceMatcher: Character string sequence matching

Description Usage Details Methods Methods References Examples

Description

Character string sequence matching

Character string sequence matching

Usage

1
# init <- SequenceMatcher$new(string1 = NULL, string2 = NULL)

Details

the ratio method returns a measure of the sequences' similarity as a float in the range [0, 1]. Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common. This is expensive to compute if getMatchingBlocks() or getOpcodes() hasn’t already been called, in which case you may want to try quickRatio() or realQuickRatio() first to get an upper bound.

the quick_ratio method returns an upper bound on ratio() relatively quickly.

the real_quick_ratio method returns an upper bound on ratio() very quickly.

the get_matching_blocks method returns a list of triples describing matching subsequences. Each triple is of the form [i, j, n], and means that a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in i and j. The last triple is a dummy, and has the value [a.length, b.length, 0]. It is the only triple with n == 0. If [i, j, n] and [i', j', n'] are adjacent triples in the list, and the second is not the last triple in the list, then i+n != i' or j+n != j'; in other words, adjacent triples always describe non-adjacent equal blocks.

The get_opcodes method returns a list of 5-tuples describing how to turn a into b. Each tuple is of the form [tag, i1, i2, j1, j2]. The first tuple has i1 == j1 == 0, and remaining tuples have i1 equal to the i2 from the preceding tuple, and, likewise, j1 equal to the previous j2. The tag values are strings, with these meanings: 'replace' a[i1:i2] should be replaced by b[j1:j2]. 'delete' a[i1:i2] should be deleted. Note that j1 == j2 in this case. 'insert' b[j1:j2] should be inserted at a[i1:i1]. Note that i1 == i2 in this case. 'equal' a[i1:i2] == b[j1:j2] (the sub-sequences are equal).

Methods

SequenceMatcher$new(string1 = NULL, string2 = NULL)
--------------
ratio()
--------------
quick_ratio()
--------------
real_quick_ratio()
--------------
get_matching_blocks()
--------------
get_opcodes()

Methods

Public methods


Method new()

Usage
SequenceMatcher$new(string1 = NULL, string2 = NULL)
Arguments
string1

a character string.

string2

a character string.


Method ratio()

Usage
SequenceMatcher$ratio()

Method quick_ratio()

Usage
SequenceMatcher$quick_ratio()

Method real_quick_ratio()

Usage
SequenceMatcher$real_quick_ratio()

Method get_matching_blocks()

Usage
SequenceMatcher$get_matching_blocks()

Method get_opcodes()

Usage
SequenceMatcher$get_opcodes()

Method clone()

The objects of this class are cloneable with this method.

Usage
SequenceMatcher$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

https://www.npmjs.com/package/difflib, http://stackoverflow.com/questions/10383044/fuzzy-string-comparison

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
try({
  if (reticulate::py_available(initialize = FALSE)) {

    if (check_availability()) {

      library(fuzzywuzzyR)

      s1 = ' It was a dark and stormy night. I was all alone sitting on a red chair.'

      s2 = ' It was a murky and stormy night. I was all alone sitting on a crimson chair.'

      init = SequenceMatcher$new(string1 = s1, string2 = s2)

      init$ratio()

      init$quick_ratio()

      init$real_quick_ratio()

      init$get_matching_blocks()

      init$get_opcodes()

    }
  }
}, silent=TRUE)

fuzzywuzzyR documentation built on Sept. 11, 2021, 5:06 p.m.