The purpose of dyadicdist
is to provide quick and easy calculation of
dyadic distances between geo-referenced points. The main contribution of
dyadicdist
is that the output is stored as a long, dyadic tibble
as
opposed to a wide matrix
.
This is still a development version. Please don’t hesitate to let me know of any errors and/or deficiencies you might come across.
A simple example illustrates the purpose of dyadicdist
and its four
main functions: ddist()
, ddist_sf()
, ddist_xy()
, and
ddist_xy_sf()
:
library(tidyverse)
library(dyadicdist)
df <- tibble::tribble(
~city_name, ~idvar, ~latitude, ~longitude,
"copenhagen", 5, 55.68, 12.58,
"stockholm", 2, 59.33, 18.07,
"oslo", 51, 59.91, 10.75
)
ddist(data = df,
id = "idvar")
#> # A tibble: 9 × 7
#> distance distance_units city_name_1 idvar_1 city_name_2 idvar_2 match_id
#> <dbl> <chr> <chr> <dbl> <chr> <dbl> <chr>
#> 1 0 m copenhagen 5 copenhagen 5 5_5
#> 2 521455. m copenhagen 5 stockholm 2 5_2
#> 3 482648. m copenhagen 5 oslo 51 5_51
#> 4 521455. m stockholm 2 copenhagen 5 2_5
#> 5 0 m stockholm 2 stockholm 2 2_2
#> 6 416439. m stockholm 2 oslo 51 2_51
#> 7 482648. m oslo 51 copenhagen 5 51_5
#> 8 416439. m oslo 51 stockholm 2 51_2
#> 9 0 m oslo 51 oslo 51 51_51
At the moment, dyadicdist
is under review at
CRAN and is thus not yet available.
You can install the development version from GitHub with:
if(!require("devtools")) install.packages("devtools")
library(devtools)
devtools::install_github("jvieroe/dyadicdist")
Below, I describe some of the key features of dyadicdist
. Let’s use
some data on the 100 largest US cities as a working example:
library(dyadicdist)
library(tidyverse)
library(magrittr)
cities <- dyadicdist::cities
ddist()
ddist()
takes as input a data.frame
or a tibble
and returns a
tibble
with dyadic distances for any combination of points i and j
(see more below).
Beyond the data
argument it requires the specification of latitude
and longitude
as well as a unique id
indicator (the latter can be
either numeric
, integer
, factor
, or character
).
ddist(cities,
id = "id") %>%
head(5)
#> # A tibble: 5 × 11
#> distance distance_units city_1 state_1 country_1 id_1 city_2 state_2
#> <dbl> <chr> <chr> <chr> <chr> <int> <chr> <chr>
#> 1 0 m Schenectady NY USA 275 Schenecta… NY
#> 2 31869. m Schenectady NY USA 275 Saratoga … NY
#> 3 204716. m Schenectady NY USA 275 Rye NY
#> 4 133700. m Schenectady NY USA 275 Rome NY
#> 5 24559. m Schenectady NY USA 275 Rensselaer NY
#> # … with 3 more variables: country_2 <chr>, id_2 <int>, match_id <chr>
As a default, latitude
and longitude
are specified as "latitude"
and "longitude"
, respectively, and don’t need manual inputs. If
necessary their variable names can be specified in the ddist()
call:
cities %>%
rename(lat = latitude,
lon = longitude) %>%
ddist(.,
id = "id",
latitude = "lat",
longitude = "lon") %>%
head(5)
#> # A tibble: 5 × 11
#> distance distance_units city_1 state_1 country_1 id_1 city_2 state_2
#> <dbl> <chr> <chr> <chr> <chr> <int> <chr> <chr>
#> 1 0 m Schenectady NY USA 275 Schenecta… NY
#> 2 31869. m Schenectady NY USA 275 Saratoga … NY
#> 3 204716. m Schenectady NY USA 275 Rye NY
#> 4 133700. m Schenectady NY USA 275 Rome NY
#> 5 24559. m Schenectady NY USA 275 Rensselaer NY
#> # … with 3 more variables: country_2 <chr>, id_2 <int>, match_id <chr>
ddist_sf()
: spatial input dataTo measure dyadic distances with an object of class sf
use
ddist_sf()
:
library(sf)
cities %>%
st_as_sf(.,
coords = c("longitude", "latitude"),
crs = 4326) %>%
ddist_sf(.,
id = "id") %>%
head(5)
#> # A tibble: 5 × 11
#> distance distance_units city_1 state_1 country_1 id_1 city_2 state_2
#> <dbl> <chr> <chr> <chr> <chr> <int> <chr> <chr>
#> 1 0 m Schenectady NY USA 275 Schenecta… NY
#> 2 31869. m Schenectady NY USA 275 Saratoga … NY
#> 3 204716. m Schenectady NY USA 275 Rye NY
#> 4 133700. m Schenectady NY USA 275 Rome NY
#> 5 24559. m Schenectady NY USA 275 Rensselaer NY
#> # … with 3 more variables: country_2 <chr>, id_2 <int>, match_id <chr>
With the exception of crs
, longitude
, and latitude
(all of which
are inherently provided in an object of class sf
), ddist_sf()
takes
the same optional arguments as ddist()
.
ddist()
and ddist_sf()
By default, ddist()
and ddist_sf()
return the full list of dyadic
distances between any points i and j, including j = i. In total, this
amount to nrow(data) * nrow(data)
dyads and includes by default:
Both of these inclusions are optional, however.
diagonal = FALSE
duplicates = FALSE
diagonal = FALSE
and
duplicates = FALSE
ddist_xy()
and ddist_xy_sf()
: dual data inputsddist()
and ddist_sf()
take as input a single data.frame
or
tibble
and returns dyads and dyadic distances between each
observation.
The ddist_xy*()
functions performs the same underlying task but takes
two data inputs, x
and y
. For each input you need to specify an
id variable (id_x
and id_y
) as well as longitude/latitude variables
(both defaulting to "longitude"
and "latitude"
)
fl <- cities %>%
filter(state == "FL")
ca <- cities %>%
filter(state == "CA") %>%
rename(id_var = id)
ddist_xy(x = fl,
y = ca,
ids = c("id", "id_var")) %>%
head(5)
#> # A tibble: 5 × 11
#> distance distance_units city_1 state_1 country_1 id city_2 state_2
#> <dbl> <chr> <chr> <chr> <chr> <int> <chr> <chr>
#> 1 3639194. m Madeira Beach FL USA 224 South L… CA
#> 2 3552612. m Madeira Beach FL USA 224 Carpint… CA
#> 3 3522633. m Madeira Beach FL USA 224 Port Hu… CA
#> 4 3338749. m Madeira Beach FL USA 224 Vista CA
#> 5 3823367. m Madeira Beach FL USA 224 San Mat… CA
#> # … with 3 more variables: country_2 <chr>, id_var <int>, match_id <chr>
As with ddist()
, we can apply the ddist_xy()
function on spatial
objects of class sf
too:
fl <- cities %>%
filter(state == "FL") %>%
st_as_sf(coords = c("longitude", "latitude"),
crs = 4326)
ca <- cities %>%
filter(state == "CA") %>%
rename(id_var = id) %>%
st_as_sf(coords = c("longitude", "latitude"),
crs = 4326)
ddist_xy_sf(x = fl,
y = ca,
ids = c("id", "id_var")) %>%
head(5)
#> # A tibble: 5 × 11
#> distance distance_units city_1 state_1 country_1 id city_2 state_2
#> <dbl> <chr> <chr> <chr> <chr> <int> <chr> <chr>
#> 1 3639194. m Madeira Beach FL USA 224 South L… CA
#> 2 3552612. m Madeira Beach FL USA 224 Carpint… CA
#> 3 3522633. m Madeira Beach FL USA 224 Port Hu… CA
#> 4 3338749. m Madeira Beach FL USA 224 Vista CA
#> 5 3823367. m Madeira Beach FL USA 224 San Mat… CA
#> # … with 3 more variables: country_2 <chr>, id_var <int>, match_id <chr>
By default ddist()
and ddist_xy()
assume unprojected coordinates in
basic latitude/longitude format (EPSG code 4326
) when converting the
raw data provided in the data
argument to a spatial feature. This is
consistent with the default when converting latitude/longitude data to
spatial features in the sf
package (see sf::st_as_sf()
). You can
apply a different CRS by providing a valid EPSG code of type numeric
with the crs
argument.
All ddist*()
functions allow you to transform the CRS before
calculating dyadic distances using the crs_transform
and new_crs
arguments:
ddist(cities,
id = "id",
crs_transform = T,
new_crs = 3359)
#> # A tibble: 10,000 × 11
#> distance distance_units city_1 state_1 country_1 id_1 city_2 state_2
#> <dbl> <chr> <chr> <chr> <chr> <int> <chr> <chr>
#> 1 0 US_survey_foot Schenectady NY USA 275 Schenect… NY
#> 2 105468. US_survey_foot Schenectady NY USA 275 Saratoga… NY
#> 3 675517. US_survey_foot Schenectady NY USA 275 Rye NY
#> 4 443781. US_survey_foot Schenectady NY USA 275 Rome NY
#> 5 81318. US_survey_foot Schenectady NY USA 275 Renssela… NY
#> 6 706757. US_survey_foot Schenectady NY USA 275 Plattsbu… NY
#> 7 558267. US_survey_foot Schenectady NY USA 275 Peekskill NY
#> 8 478389. US_survey_foot Schenectady NY USA 275 Oneida NY
#> 9 694798. US_survey_foot Schenectady NY USA 275 New Roch… NY
#> 10 696411. US_survey_foot Schenectady NY USA 275 Mount Ve… NY
#> # … with 9,990 more rows, and 3 more variables: country_2 <chr>, id_2 <int>,
#> # match_id <chr>
For a list of supported CRS transformations, see rgdal::make_EPSG()
.
Note that the choice of CRS may impact your results considerably. For more information on choosing an appropriate CRS, see here, here, here, and here
If you use dyadicdist
for a publication, feel free to cite the package
accordingly:
Vierø, Jeppe (2022). dyadicdist: Compute Dyadic Distances. R package version 0.3.1
The BibTeX
entry for the (current version of the) package is:
@Manual{
title = {dyadicdist: Compute Dyadic Distances},
author = {Jeppe Vierø},
year = {2022},
note = {R package version 0.3.1},
url = {https://github.com/jvieroe/dyadicdist},
}
sf
package. sf
has greatly reduced
barriers to entry for anyone working with spatial data in R
and
those who wish to do so
dyadicdist::cities
dataAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.