Description Usage Arguments Details Value See Also Examples
Function to join tables where the values we are matching by do not match exactly
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
df.x |
Left table to be joined |
df.y |
Right table to be joined |
by.x |
A character vector of variables to join the left table by |
by.y |
A character vector of variables to join the right table by |
method |
method to calculate string distance by. See help for stringdist::stringdist , Default: 'jw' |
cutoff |
Maximum string distance to allow matching by, 0 requires exact matches |
join_type |
Type of join to perform. Accepts left, right, inner, full, semi, and anti. Default: 'left' |
unique |
If true will only match unique values, Default: F |
match_vals |
Create a column to display the string distance, Default: TRUE |
sort |
Will sort the table based on string distance. Accepts "desc", "asc" and NULL. Default: NULL |
useBytes |
If |
p |
Penalty factor for Jaro-Winkler distance. The valid range for p is 0 <= p <= 0.25. If p=0 (default), the Jaro-distance is returned. Applies only to method='jw', Default: 0 |
weight |
For method='osa' or 'dl', the penalty for deletion, insertion, substitution and transposition, in that order. When method='lv', the penalty for transposition is ignored. When method='jw', the weights associated with characters of a, characters from b and the transposition weight, in that order. Weights must be positive and not exceed 1. weight is ignored completely when method='hamming', 'qgram', 'cosine', 'Jaccard', 'lcs', or soundex., Default: c(d = 1, i = 1, s = 1, t = 1) |
q |
Size of the q-gram; must be nonnegative. Only applies to method='qgram', 'jaccard' or 'cosine'., Default: 1 |
bt |
Winkler's boost threshold. Winkler's penalty factor is only applied when the Jaro distance is larger than bt. Applies only to method='jw' and p>0., Default: 0 |
Function to join tables where the columns to join by don't match exactly. Should use the clean function prior to running fuzzy_match
Returns a data.frame with two data.frames input joined
unfactor
stringdist
join
,arrange
1 2 3 4 5 6 | ## Not run:
congress <- clean(congress, name, selected = ",", prefixes = T, suffixes = T)
politwoops <- clean(politwoops, full_name, selected = ",", prefixes = T, suffixes = T)
fuzzy_match(congress, politwoops, name, full_name, join_type = "inner", cutoff = .1)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.