join_fuzzily: Join two 'data.frame's using fuzzy string matching.

Description Usage Arguments Details Value See Also

View source: R/join.R

Description

Join two data.frames using fuzzy string matching.

Usage

1
2
3
join_fuzzily(x, y, mode = "inner", max_dist = 0, ...,
  by = intersect(names(x), names(y)), copy = FALSE, suffix = c("_x",
  "_y"))

Arguments

x

tbls to join

y

tbls to join

mode

From stringdist::stringdist_join() documentation: One of "inner", "left", "right", "full" "semi", or "anti".

max_dist

From stringdist::stringdist_join() documentation: Maximum distance to use for joining.

...

From stringdist::stringdist_join() documentation: Arguments passed on to stringdist.

by

a character vector of variables to join by. If NULL, the default, *_join() will do a natural join, using all variables with common names across the two tables. A message lists the variables so that you can check they're right (to suppress the message, simply explicitly list the variables that you want to join).

To join by different variables on x and y use a named vector. For example, by = c("a" = "b") will match x.a to y.b.

copy

If x and y are not from the same data source, and copy is TRUE, then y will be copied into the same src as x. This allows you to join tables across srcs, but it is a potentially expensive operation so you must opt into it.

suffix

If there are non-joined duplicate variables in x and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.

Details

This function is effectively a customized version of the stringdist_join functions in the {fuzzyjoin} package. Additionally, this function is primarily intended to be uzed before tetidy::summarise_join_stats() with mode = "full".

Value

A tibble.

See Also

dplyr::inner_join()


tonyelhabr/tetidy documentation built on May 29, 2019, 3:18 p.m.