Description Usage Details Methods Methods References Examples
Fuzzy character string matching ( ratios )
Fuzzy character string matching ( ratios )
1 | # init <- FuzzMatcher$new(decoding = NULL)
|
the decoding parameter is useful in case of non-ascii character strings. If this parameter is not NULL then the force_ascii parameter (if applicable) is internally set to FALSE. Decoding applies only to python 2 configurations, as in python 3 character strings are decoded to unicode by default.
the Partial_token_set_ratio method works in the following way : 1. Find all alphanumeric tokens in each string, 2. treat them as a set, 3. construct two strings of the form, <sorted_intersection><sorted_remainder>, 4. take ratios of those two strings, 5. controls for unordered partial matches (HERE partial match is TRUE)
the Partial_token_sort_ratio method returns the ratio of the most similar substring as a number between 0 and 100 but sorting the token before comparing.
the Ratio method returns a ration in form of an integer value based on a SequenceMatcher-like class, which is built on top of the Levenshtein package (https://github.com/miohtama/python-Levenshtein)
the QRATIO method performs a quick ratio comparison between two strings. Runs full_process from utils on both strings. Short circuits if either of the strings is empty after processing.
the WRATIO method returns a measure of the sequences' similarity between 0 and 100, using different algorithms. Steps in the order they occur : 1. Run full_process from utils on both strings, 2. Short circuit if this makes either string empty, 3. Take the ratio of the two processed strings (fuzz.ratio), 4. Run checks to compare the length of the strings (If one of the strings is more than 1.5 times as long as the other use partial_ratio comparisons - scale partial results by 0.9 - this makes sure only full results can return 100 - If one of the strings is over 8 times as long as the other instead scale by 0.6), 5. Run the other ratio functions (if using partial ratio functions call partial_ratio, partial_token_sort_ratio and partial_token_set_ratio scale all of these by the ratio based on length otherwise call token_sort_ratio and token_set_ratio all token based comparisons are scaled by 0.95 - on top of any partial scalars) 6. Take the highest value from these results round it and return it as an integer.
the UWRATIO method returns a measure of the sequences' similarity between 0 and 100, using different algorithms. Same as WRatio but preserving unicode
the UQRATIO method returns a Unicode quick ratio. It calls QRATIO with force_ascii set to FALSE.
the Token_sort_ratio method returns a measure of the sequences' similarity between 0 and 100 but sorting the token before comparing
the Partial_ratio returns the ratio of the most similar substring as a number between 0 and 100.
the Token_set_ratio method works in the following way : 1. Find all alphanumeric tokens in each string, 2. treat them as a set, 3. construct two strings of the form, <sorted_intersection><sorted_remainder>, 4. take ratios of those two strings, 5. controls for unordered partial matches (HERE partial match is FALSE)
FuzzMatcher$new(decoding = NULL)
--------------
Partial_token_set_ratio(string1 = NULL, string2 = NULL, force_ascii = TRUE, full_process = TRUE)
--------------
Partial_token_sort_ratio(string1 = NULL, string2 = NULL, force_ascii = TRUE, full_process = TRUE)
--------------
Ratio(string1 = NULL, string2 = NULL)
--------------
QRATIO(string1 = NULL, string2 = NULL, force_ascii = TRUE)
--------------
WRATIO(string1 = NULL, string2 = NULL, force_ascii = TRUE)
--------------
UWRATIO(string1 = NULL, string2 = NULL)
--------------
UQRATIO(string1 = NULL, string2 = NULL)
--------------
Token_sort_ratio(string1 = NULL, string2 = NULL, force_ascii = TRUE, full_process = TRUE)
--------------
Partial_ratio(string1 = NULL, string2 = NULL)
--------------
Token_set_ratio(string1 = NULL, string2 = NULL, force_ascii = TRUE, full_process = TRUE)
new()
FuzzMatcher$new(decoding = NULL)
decoding
either NULL or a character string. If not NULL then the decoding parameter takes one of the standard python encodings (such as 'utf-8'). See the details and references link for more information.
Partial_token_set_ratio()
FuzzMatcher$Partial_token_set_ratio( string1 = NULL, string2 = NULL, force_ascii = TRUE, full_process = TRUE )
string1
a character string.
string2
a character string.
force_ascii
allow only ASCII characters (force convert to ascii)
full_process
either TRUE or FALSE. If TRUE then it process the string by : 1. removing all but letters and numbers, 2. trim whitespace, 3. force to lower case
Partial_token_sort_ratio()
FuzzMatcher$Partial_token_sort_ratio( string1 = NULL, string2 = NULL, force_ascii = TRUE, full_process = TRUE )
string1
a character string.
string2
a character string.
force_ascii
allow only ASCII characters (force convert to ascii)
full_process
either TRUE or FALSE. If TRUE then it process the string by : 1. removing all but letters and numbers, 2. trim whitespace, 3. force to lower case
Ratio()
FuzzMatcher$Ratio(string1 = NULL, string2 = NULL)
string1
a character string.
string2
a character string.
QRATIO()
FuzzMatcher$QRATIO(string1 = NULL, string2 = NULL, force_ascii = TRUE)
string1
a character string.
string2
a character string.
force_ascii
allow only ASCII characters (force convert to ascii)
WRATIO()
FuzzMatcher$WRATIO(string1 = NULL, string2 = NULL, force_ascii = TRUE)
string1
a character string.
string2
a character string.
force_ascii
allow only ASCII characters (force convert to ascii)
UWRATIO()
FuzzMatcher$UWRATIO(string1 = NULL, string2 = NULL)
string1
a character string.
string2
a character string.
UQRATIO()
FuzzMatcher$UQRATIO(string1 = NULL, string2 = NULL)
string1
a character string.
string2
a character string.
Token_sort_ratio()
FuzzMatcher$Token_sort_ratio( string1 = NULL, string2 = NULL, force_ascii = TRUE, full_process = TRUE )
string1
a character string.
string2
a character string.
force_ascii
allow only ASCII characters (force convert to ascii)
full_process
either TRUE or FALSE. If TRUE then it process the string by : 1. removing all but letters and numbers, 2. trim whitespace, 3. force to lower case
Partial_ratio()
FuzzMatcher$Partial_ratio(string1 = NULL, string2 = NULL)
string1
a character string.
string2
a character string.
Token_set_ratio()
FuzzMatcher$Token_set_ratio( string1 = NULL, string2 = NULL, force_ascii = TRUE, full_process = TRUE )
string1
a character string.
string2
a character string.
force_ascii
allow only ASCII characters (force convert to ascii)
full_process
either TRUE or FALSE. If TRUE then it process the string by : 1. removing all but letters and numbers, 2. trim whitespace, 3. force to lower case
clone()
The objects of this class are cloneable with this method.
FuzzMatcher$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://github.com/seatgeek/fuzzywuzzy/blob/master/fuzzywuzzy/fuzz.py, https://docs.python.org/3/library/codecs.html#standard-encodings
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | try({
if (reticulate::py_available(initialize = FALSE)) {
if (check_availability()) {
library(fuzzywuzzyR)
s1 = "Atlanta Falcons"
s2 = "New York Jets"
init = FuzzMatcher$new()
init$Partial_token_set_ratio(string1 = s1,
string2 = s2,
force_ascii = TRUE,
full_process = TRUE)
init$Partial_token_sort_ratio(string1 = s1,
string2 = s2,
force_ascii = TRUE,
full_process = TRUE)
init$Ratio(string1 = s1, string2 = s2)
init$QRATIO(string1 = s1, string2 = s2, force_ascii = TRUE)
init$WRATIO(string1 = s1, string2 = s2, force_ascii = TRUE)
init$UWRATIO(string1 = s1, string2 = s2)
init$UQRATIO(string1 = s1, string2 = s2)
init$Token_sort_ratio(string1 = s1, string2 = s2, force_ascii = TRUE, full_process = TRUE)
init$Partial_ratio(string1 = s1, string2 = s2)
init$Token_set_ratio(string1 = s1, string2 = s2, force_ascii = TRUE, full_process = TRUE)
}
}
}, silent=TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.