knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
Sys.setenv(LANG = "en_US.UTF-8")
options(rmarkdown.html_vignette.check_title = FALSE)

Introduction

sf::st_as_sf(data.table::rbindllist(<list_of_sf>)) is a fast alternative to do.call(rbind, <list_of_sf>) for binding lists of simple features objects (sf) to single sf. However, there are some pitfalls when rbindlist() is applied to lists of sf. These are explored, and some workarounds are presented.

Packages required

library(dplyr)
library(data.table)
library(sf)

Compare rbindlist() and do.call(rbind, )

Inherited classes and bbox

We create a list of sf objects:

nc <- st_read(system.file("gpkg/nc.gpkg", package = "sf"), quiet = TRUE)

list_of_sf <- lapply(seq_len(nrow(nc)), function(x) nc[x, ])

Binding such a list of sf with do.call(rbind, ) to a single sf object works easily (as long as the list is rather small):

nc_rbind <- do.call(rbind, list_of_sf)

And will return the expected object of the class sf:

all.equal(nc_rbind, nc)

What's different when rbindlist() is used instead of do.call(rbind, )?

First of all, rbindlist() on its own does not bind a list of sf to an object of the class sf:

rbindlist(list_of_sf) %>% class()

Thus, subsequently the return of rbindlist() needs to be converted into an object of the the class sf:

nc_rbindlist <- rbindlist(list_of_sf) %>% st_as_sf()

Is the return of the pipe rbindlist() %>% st_as_sf() equivalent to the one of do.call(rbind, )?

all.equal(nc_rbindlist, nc)

No, not really! The pipe-output has inherited the class data.table:

class(nc)
class(nc_rbindlist)

Will omitting the class data.table eliminate the difference?

class(nc_rbindlist) <- c("sf", "data.frame")
all.equal(nc_rbindlist, nc)

There's also an issue with bbox of the geometry column:

st_bbox(nc_rbindlist) == st_bbox(nc)

Instead of returning the correct bbox for the whole geometry column, data.table::rbindlist() simply copies the bbox of the first listed sf object:

st_bbox(nc_rbindlist) == st_bbox(list_of_sf[[1]])

A simple trick solves this issue inherent to sf-objects compiled with data.table::rbindlist():

sf_former_dt <- sf_former_dt[1:nrow(sf_former_dt), ]
# or, preferable in a programming context:
sf_former_dt <- sf_former_dt[seq_len(nrow(sf_former_dt)), ]

We apply the trick ...

nc_rbindlist <- nc_rbindlist[seq_len(nrow(nc_rbindlist)), ]

... check if it worked:

all.equal(nc_rbindlist, nc)

CRS

lists containing sf-objects with different CRS are treated quite differently by do.call(rbind, ) and rbindlist() %>% st_as_sf(). To demonstrate that, we create such a list:

nc <- st_read(system.file("gpkg/nc.gpkg", package = "sf"), quiet = TRUE)

nc_3857 <- st_transform(nc, 3857)

list_of_sf_with_unequal_crs <- list(nc, nc_3857)

rbind() does not bind sf-objects with unequal CRS and throws a corresponding error-message:

do.call(rbind, list_of_sf_with_unequal_crs)

Whereas rbindlist() doesn't seem to be affected:

sf_dt <- list_of_sf_with_unequal_crs %>% rbindlist() %>% st_as_sf()

The returned object of the classes sf and data.table has a CRS which is the same as the one of first listed sf-object:

st_crs(sf_dt)$epsg
st_crs(sf_dt) == st_crs(nc)

If we reverse the sequence of elements of the list of sf-objects with unequal CRS the newly first placed sf-object dictates the CRS of the by rbindlist() %>% st_as_sf bound single sf-object:

sf_dt <- list_of_sf_with_unequal_crs %>% rev() %>% rbindlist() %>% st_as_sf()
st_crs(sf_dt)$epsg
st_crs(sf_dt) == st_crs(nc_3857)

The maloperation of rbindlist() regarding lists of sf-objects with different CRS can't be handled after the fact. Thus, feeding such a list to rbindlist() must be avoided. Hence prior checking if a list of sf contains more than 1 CRS is crucial:

list(nc, nc_3857) %>% lapply(st_crs) %>% n_distinct()

Different geometry_types

A list with sf-objects having geometry columns of different geometry_types:

l_diff_geom_type <- list(nc[c(17, 56), ], st_cast(nc[4, ], "POLYGON", warn = FALSE))

sapply(l_diff_geom_type, st_geometry_type, by_geometry = FALSE) %>% unique()

rbind() can easily stack sf-objects with geometry columns of different geometry_types and the returned geometry_type is GEOMETRY:

sf_do.call <- do.call(rbind, l_diff_geom_type)

st_geometry_type(sf_do.call, by_geometry = FALSE)

But rbindlist() can't handle a list containing different geometry_types:

rbindlist(l_diff_geom_type)

In order to make such a list workable for rbindlist(), the geometry_types have to be homogenized:

l_homogenized_geom_type <- lapply(l_diff_geom_type, st_cast, to = "GEOMETRY", warn = FALSE)

sapply(l_homogenized_geom_type, st_geometry_type, by_geometry = FALSE) %>% unique()

Once the listed sf-objects have the same geometry_type, rbindlist() can stack them:

sf_rbindlist <- rbindlist(l_homogenized_geom_type) %>% st_as_sf()

st_geometry_type(sf_rbindlist, by_geometry = FALSE)

Note that the above demonstrated way of homogenizing geometry_types isn't very time efficient for large lists of sf-objects. Luckily there's a faster alternative:

l <- l_diff_geom_type # for the sake of code readability, copy list as short-named object

for (i in seq_along(l)) {
  class(l[[i]][[attr(l[[i]], "sf_column")]])[[1]] <- "sfc_GEOMETRY"
}

We check if the two homogenizing methods are equivalent:

all.equal(l_homogenized_geom_type, l)
all.equal(
  sf_rbindlist, 
  st_as_sf(rbindlist(l))
  )
sf_rbindlist            <- sf_rbindlist[1:nrow(sf_rbindlist), ]
class(sf_rbindlist)     <- c("sf", "data.frame")
row.names(sf_rbindlist) <- row.names(sf_do.call)
all.equal(sf_rbindlist, sf_do.call)

Conclusions

The use of data.table::rbindlist() as a fast alternative to do.call(rbind, ) for binding lists of sf-objects to a single sf-object is justified in situations when do.call(rbind, ) is too slow. It does, however, require appropriate pre- and post-treatment:

  1. Check if a list of sf-objects contains more than 1 distinct CRS:
list_of_sf %>% lapply(st_crs) %>% n_distinct()

If so, CRS-unification needs to be done before moving on.

?st_transform
?st_crs
  1. Check if the listed sf-objects have different geometry_types:
sapply(list_of_sf, st_geometry_type, by_geometry = FALSE) %>% unique()

If so, homogenize them:

list_of_sf <- lapply(list_of_sf, st_cast, to = "GEOMETRY", warn = FALSE)
# or more advisable because faster (for large lists):
for (i in seq_along(list_of_sf)) {
  class(list_of_sf[[i]][[attr(list_of_sf[[i]], "sf_column")]])[[1]] <- "sfc_GEOMETRY"
}
  1. Bind list of sf-objects to a single sf-object (which is also a data.table):
list_of_sf %>% rbindlist() %>% st_as_sf() -> sf_object
  1. Fix bounding box / bbox:
sf_object <- sf_object[1:nrow(sf_object), ]
# or, preferable in a programming context:
sf_object <- sf_object[seq_len(nrow(sf_object)), ]
  1. If required, omit class data.table:
class(sf_object) <- c("sf", "data.frame")

Note that those five steps are integrated in the function sfhelpers::st_rbindlist() enabling a painless and fast conversion of a list of sf objects to a single sf object. Moreover st_rbindlist() has argument options capable of handling lists of sf objects that differ by the names and positions of their geometry and other attribute columns.

?sfhelpers::st_rbindlist

Links



a-benini/sfhelpers documentation built on Aug. 28, 2024, 3:30 a.m.