Description Usage Arguments Details Examples
This allows joining based on combinations of longitudes and latitudes. If
you are using a distance metric that is *not* based on latitude and
longitude, use distance_join
instead. Distances are
calculated based on the distHaversine
, distGeo
,
distCosine
, etc methods in the geosphere package.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | geo_join(
x,
y,
by = NULL,
max_dist,
method = c("haversine", "geo", "cosine", "meeus", "vincentysphere",
"vincentyellipsoid"),
unit = c("miles", "km"),
mode = "inner",
distance_col = NULL,
...
)
geo_inner_join(
x,
y,
by = NULL,
method = "haversine",
max_dist = 1,
distance_col = NULL,
...
)
geo_left_join(
x,
y,
by = NULL,
method = "haversine",
max_dist = 1,
distance_col = NULL,
...
)
geo_right_join(
x,
y,
by = NULL,
method = "haversine",
max_dist = 1,
distance_col = NULL,
...
)
geo_full_join(
x,
y,
by = NULL,
method = "haversine",
max_dist = 1,
distance_col = NULL,
...
)
geo_semi_join(
x,
y,
by = NULL,
method = "haversine",
max_dist = 1,
distance_col = NULL,
...
)
geo_anti_join(
x,
y,
by = NULL,
method = "haversine",
max_dist = 1,
distance_col = NULL,
...
)
|
x |
A tbl |
y |
A tbl |
by |
Columns by which to join the two tables |
max_dist |
Maximum distance to use for joining |
method |
Method to use for computing distance: one of "haversine" (default), "geo", "cosine", "meeus", "vincentysphere", "vincentyellipsoid" |
unit |
Unit of distance for threshold (default "miles") |
mode |
One of "inner", "left", "right", "full" "semi", or "anti" |
distance_col |
If given, will add a column with this name containing the geographical distance between the two |
... |
Extra arguments passed on to the distance method |
"Haversine" was chosen as default since in some tests it is approximately the fastest method. Note that by far the slowest method is vincentyellipsoid, and on fuzzy joins should only be used when there are very few pairs and accuracy is imperative.
If you need to use a custom geo method, you may want to write it directly
with the multi_by
and multi_match_fun
arguments to
fuzzy_join
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | library(dplyr)
data("state")
# find pairs of US states whose centers are within
# 200 miles of each other
states <- data_frame(state = state.name,
longitude = state.center$x,
latitude = state.center$y)
s1 <- rename(states, state1 = state)
s2 <- rename(states, state2 = state)
pairs <- s1 %>%
geo_inner_join(s2, max_dist = 200) %>%
filter(state1 != state2)
pairs
# plot them
library(ggplot2)
ggplot(pairs, aes(x = longitude.x, y = latitude.x,
xend = longitude.y, yend = latitude.y)) +
geom_segment(color = "red") +
borders("state") +
theme_void()
# also get distances
s1 %>%
geo_inner_join(s2, max_dist = 200, distance_col = "distance")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.