SequenceMatcher: Character string sequence matching

Description Usage Arguments Format Details Methods References Examples

Description

Character string sequence matching

Usage

1
# init <- SequenceMatcher$new(string1 = NULL, string2 = NULL)

Arguments

string1

a character string.

string2

a character string.

Format

An object of class R6ClassGenerator of length 24.

Details

the ratio method returns a measure of the sequences' similarity as a float in the range [0, 1]. Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common. This is expensive to compute if getMatchingBlocks() or getOpcodes() hasn’t already been called, in which case you may want to try quickRatio() or realQuickRatio() first to get an upper bound.

the quick_ratio method returns an upper bound on ratio() relatively quickly.

the real_quick_ratio method returns an upper bound on ratio() very quickly.

the get_matching_blocks method returns a list of triples describing matching subsequences. Each triple is of the form [i, j, n], and means that a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in i and j. The last triple is a dummy, and has the value [a.length, b.length, 0]. It is the only triple with n == 0. If [i, j, n] and [i', j', n'] are adjacent triples in the list, and the second is not the last triple in the list, then i+n != i' or j+n != j'; in other words, adjacent triples always describe non-adjacent equal blocks.

The get_opcodes method returns a list of 5-tuples describing how to turn a into b. Each tuple is of the form [tag, i1, i2, j1, j2]. The first tuple has i1 == j1 == 0, and remaining tuples have i1 equal to the i2 from the preceding tuple, and, likewise, j1 equal to the previous j2. The tag values are strings, with these meanings: 'replace' a[i1:i2] should be replaced by b[j1:j2]. 'delete' a[i1:i2] should be deleted. Note that j1 == j2 in this case. 'insert' b[j1:j2] should be inserted at a[i1:i1]. Note that i1 == i2 in this case. 'equal' a[i1:i2] == b[j1:j2] (the sub-sequences are equal).

Methods

SequenceMatcher$new(string1 = NULL, string2 = NULL)
--------------
ratio()
--------------
quick_ratio()
--------------
real_quick_ratio()
--------------
get_matching_blocks()
--------------
get_opcodes()

References

https://www.npmjs.com/package/difflib, http://stackoverflow.com/questions/10383044/fuzzy-string-comparison

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
if (check_availability()) {


  library(fuzzywuzzyR)

  s1 = ' It was a dark and stormy night. I was all alone sitting on a red chair.'

  s2 = ' It was a murky and stormy night. I was all alone sitting on a crimson chair.'

  init = SequenceMatcher$new(string1 = s1, string2 = s2)

  init$ratio()

  init$quick_ratio()

  init$real_quick_ratio()

  init$get_matching_blocks()

  init$get_opcodes()


}

fuzzywuzzyR documentation built on May 2, 2019, 8:53 a.m.