Description Usage Arguments Format Details Methods References Examples

Fuzzy character string matching ( ratios )

1 | ```
# init <- FuzzMatcher$new(decoding = NULL)
``` |

`decoding` |
either NULL or a character string. If not NULL then the |

`string1` |
a character string. |

`string2` |
a character string. |

`force_ascii` |
allow only ASCII characters (force convert to ascii) |

`full_process` |
either TRUE or FALSE. If TRUE then it process the string by : 1. removing all but letters and numbers, 2. trim whitespace, 3. force to lower case |

An object of class `R6ClassGenerator`

of length 24.

the *decoding* parameter is useful in case of non-ascii character strings. If this parameter is not NULL then the *force_ascii* parameter (if applicable) is internally set to FALSE. Decoding applies only to python 2 configurations, as in python 3 character strings are decoded to unicode by default.

the *Partial_token_set_ratio* method works in the following way : 1. Find all alphanumeric tokens in each string, 2. treat them as a set, 3. construct two strings of the form, <sorted_intersection><sorted_remainder>, 4. take ratios of those two strings, 5. controls for unordered partial matches (HERE partial match is TRUE)

the *Partial_token_sort_ratio* method returns the ratio of the most similar substring as a number between 0 and 100 but sorting the token before comparing.

the *Ratio* method returns a ration in form of an integer value based on a SequenceMatcher-like class, which is built on top of the Levenshtein package (https://github.com/miohtama/python-Levenshtein)

the *QRATIO* method performs a quick ratio comparison between two strings. Runs full_process from utils on both strings. Short circuits if either of the strings is empty after processing.

the *WRATIO* method returns a measure of the sequences' similarity between 0 and 100, using different algorithms. Steps in the order they occur :
1. Run full_process from utils on both strings, 2. Short circuit if this makes either string empty, 3. Take the ratio of the two processed strings (fuzz.ratio),
4. Run checks to compare the length of the strings (If one of the strings is more than 1.5 times as long as the other use partial_ratio comparisons - scale partial results by 0.9 - this makes sure only full results can return 100 -
If one of the strings is over 8 times as long as the other instead scale by 0.6), 5. Run the other ratio functions (if using partial ratio functions call partial_ratio,
partial_token_sort_ratio and partial_token_set_ratio scale all of these by the ratio based on length otherwise call token_sort_ratio and token_set_ratio all token based comparisons are scaled by 0.95 - on top of any partial scalars)
6. Take the highest value from these results round it and return it as an integer.

the *UWRATIO* method returns a measure of the sequences' similarity between 0 and 100, using different algorithms. Same as WRatio but preserving unicode

the *UQRATIO* method returns a Unicode quick ratio. It calls *QRATIO* with force_ascii set to FALSE.

the *Token_sort_ratio* method returns a measure of the sequences' similarity between 0 and 100 but sorting the token before comparing

the *Partial_ratio* returns the ratio of the most similar substring as a number between 0 and 100.

the *Token_set_ratio* method works in the following way : 1. Find all alphanumeric tokens in each string, 2. treat them as a set, 3. construct two strings of the form, <sorted_intersection><sorted_remainder>, 4. take ratios of those two strings, 5. controls for unordered partial matches (HERE partial match is FALSE)

`FuzzMatcher$new(decoding = NULL)`

`--------------`

`Partial_token_set_ratio(string1 = NULL, string2 = NULL, force_ascii = TRUE, full_process = TRUE)`

`--------------`

`Partial_token_sort_ratio(string1 = NULL, string2 = NULL, force_ascii = TRUE, full_process = TRUE)`

`--------------`

`Ratio(string1 = NULL, string2 = NULL)`

`--------------`

`QRATIO(string1 = NULL, string2 = NULL, force_ascii = TRUE)`

`--------------`

`WRATIO(string1 = NULL, string2 = NULL, force_ascii = TRUE)`

`--------------`

`UWRATIO(string1 = NULL, string2 = NULL)`

`--------------`

`UQRATIO(string1 = NULL, string2 = NULL)`

`--------------`

`Token_sort_ratio(string1 = NULL, string2 = NULL, force_ascii = TRUE, full_process = TRUE)`

`--------------`

`Partial_ratio(string1 = NULL, string2 = NULL)`

`--------------`

`Token_set_ratio(string1 = NULL, string2 = NULL, force_ascii = TRUE, full_process = TRUE)`

https://github.com/seatgeek/fuzzywuzzy/blob/master/fuzzywuzzy/fuzz.py, https://docs.python.org/3/library/codecs.html#standard-encodings

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | ```
if (check_availability()) {
library(fuzzywuzzyR)
s1 = "Atlanta Falcons"
s2 = "New York Jets"
init = FuzzMatcher$new()
init$Partial_token_set_ratio(string1 = s1, string2 = s2, force_ascii = TRUE, full_process = TRUE)
init$Partial_token_sort_ratio(string1 = s1, string2 = s2, force_ascii = TRUE, full_process = TRUE)
init$Ratio(string1 = s1, string2 = s2)
init$QRATIO(string1 = s1, string2 = s2, force_ascii = TRUE)
init$WRATIO(string1 = s1, string2 = s2, force_ascii = TRUE)
init$UWRATIO(string1 = s1, string2 = s2)
init$UQRATIO(string1 = s1, string2 = s2)
init$Token_sort_ratio(string1 = s1, string2 = s2, force_ascii = TRUE, full_process = TRUE)
init$Partial_ratio(string1 = s1, string2 = s2)
init$Token_set_ratio(string1 = s1, string2 = s2, force_ascii = TRUE, full_process = TRUE)
}
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.