fuzzy_rbind | R Documentation |
fuzzy_rbind() binds dataframes based on columns with slightly different names.
fuzzy_rbind( df1, df2, threshold, method = "jw", q = 1, p = 0, bt = 0, useBytes = FALSE, weight = c(d = 1, i = 1, t = 1) )
df1 |
The first dataframe to be bound. |
df2 |
The second dataframe to be bound. |
threshold |
The maximum string distance between column names, if the distance between columns is greater than this threshold the columns will not be bound. |
method |
The type of string distance calculation to use. Possible methods are : osa, lv, dl, hamming, lcs, qgram, cosine, jaccard, jw, and soundex. See package stringdist for more information. Default: 'jw', Default: 'jw' |
q |
Size of the q-gram used in string distance calculation. Default: 1 |
p |
Only used with method "jw", the Jaro-Winkler penatly size. Default: 0 |
bt |
Only used with method "jw" with p > 0, Winkler's boost threshold. Default: 0 |
useBytes |
Whether or not to perform byte-wise comparison. Default: FALSE |
weight |
Only used with methods "osa" or "dl", a vector representing the penalty for deletion, insertion, substitution, and transposition, in that order. Default: c(d = 1, i = 1, t = 1) |
When using datasets often times column names are slightly different, and fuzzy_rbind()
helps
to bind dataframes using fuzzy matching of the column names.
fuzzy_rbind() returns a dataframe that has bound the two inputted dataframes based on the closest matching columns, column names from dataframe 1 are preserved.
if(interactive()){ mtcars_colnames_messy = mtcars colnames(mtcars_colnames_messy)[1:5] = paste0(colnames(mtcars)[1:5], "_17") colnames(mtcars_colnames_messy)[6:11] = paste0(colnames(mtcars)[6:11], "_2017") x = fuzzy_rbind(mtcars, mtcars_colnames_messy, .5) x = fuzzy_rbind(mtcars, mtcars_colnames_messy, .2) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.