In ropensci/stplanr: Sustainable Transport Planning

This article was written following new functionality added to the {stplanr} package by Josiah Parry, in this pull request (#540))](https://github.com/ropensci/stplanr/pull/540). As described issue [#539`, the original implementation was slower than needed.

This short article describes what the function does, benchmarks alternative implementation, and concludes with some thoughts about the {stplanr} package. It was written straight after the new functionality was implemented, so we'll start by installing that version of the package:

remotes::install_github(repo = "ropensci/stplanr", ref = "39858b6")

Loading the package allows us to run the function.

library(stplanr)

mats2line() is a function that takes 2 sets of coordinates and converts them into linestrings. A simple example is shown function's online documentation hosted by community science support organsiation rOpenSci:

m1 <- matrix(c(1, 2, 1, 2), ncol = 2)
m2 <- matrix(c(9, 9, 9, 1), ncol = 2)
l <- mats2line(m1, m2)
class(l)
l
lsf <- sf::st_sf(l, crs = 4326)
class(lsf)
plot(lsf)

The previous version of mats2line() as as follows, which shows how it works (see the new implementation in source code):

mats2line_old <- function(mat1, mat2, crs = NA) {
  l <- lapply(1:nrow(mat1), function(i) {
    mat_combined <- rbind(mat1[i, ], mat2[i, ])
    sf::st_linestring(mat_combined)
  })
  if(is.na(crs)) {
    sf::st_sfc(l)
  } else {
    sf::st_sfc(l, crs = crs)
  }
}

The new function is as follows:

mats2line

We can check the 2 results are similar as follows:

l_old <- mats2line_old(m1, m2)
waldo::compare(l_old, l)

As shown, the 2 results are identical.

Let's do a quick benchmark:

bench::mark(
  mats2line_old(m1, m2),
  mats2line(m1, m2)
)

From that you may think there's no benefit to speeding things up. But, when you look at the benchmark results for larger matrices, you can see the difference (as shown in the initial issue #539:

m1 <- matrix(rnorm(20000), ncol = 2)
m2 <- matrix(runif(20000), ncol = 2)
bench::mark(
  mats2line_old(m1, m2),
  mats2line(m1, m2)
)

The results show that the new implementation is more than 10x faster. But that's not the end of the story. There is another implementation, in the package {od}, which works as follows:

odc <- cbind(m1[1:3, ], m2[1:3, ])
desire_lines <- od::odc_to_sfc(odc)
waldo::compare(desire_lines, mats2line(m1[1:3, ], m2[1:3, ]))

bench::mark(
  mats2line_old(m1, m2),
  mats2line(m1, m2),
  od::odc_to_sfc(cbind(m1, m2))
)

Given the advantages of modularity, and that the purpose of the {od} package is to work with origin-destination data, it makes sense to use the {od} implementation.

The {od} implementation uses the following:

od_coordinates_ids = function(odc) {
  res = data.frame(id = rep(1:nrow(odc), each = 2), x = NA, y = NA)
  ids_odd = seq(1, nrow(res), by = 2)
  ids_even = ids_odd + 1
  res[ids_odd, c("x", "y")] = odc[, 1:2]
  res[ids_even, c("x", "y")] = odc[, 3:4]
  res
}
od_coordinates_ids(odc)

Let's see if we can speed that up:

od_coordinates_ids2 = function(odc) {
  res = vctrs::vec_interleave(odc[, 1:2], odc[, 3:4])
  res = data.frame(id = rep(1:nrow(odc), each = 2), x = res[, 1], y = res[, 2])
  res
}
waldo::compare(
  od_coordinates_ids(odc),
  od_coordinates_ids2(odc)
)
odc = cbind(m1, m2)
bench::mark(
  od = od_coordinates_ids(odc),
  vctrs = od_coordinates_ids2(odc)
)