knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) Sys.setenv(LANG = "en_US.UTF-8") options(rmarkdown.html_vignette.check_title = FALSE)
sf::st_as_sf(data.table::rbindllist(<list_of_sf>)) is a
fast alternative
to do.call(rbind, <list_of_sf>) for binding lists of simple features objects
(sf) to single sf. However, there are some pitfalls when rbindlist() is
applied to lists of sf. These are explored, and some workarounds are
presented.
library(dplyr) library(data.table) library(sf)
rbindlist() and do.call(rbind, )bboxWe create a list of sf objects:
nc <- st_read(system.file("gpkg/nc.gpkg", package = "sf"), quiet = TRUE) list_of_sf <- lapply(seq_len(nrow(nc)), function(x) nc[x, ])
Binding such a list of sf with do.call(rbind, ) to a single sf object
works easily (as long as the list is rather small):
nc_rbind <- do.call(rbind, list_of_sf)
And will return the expected object of the class sf:
all.equal(nc_rbind, nc)
What's different when rbindlist() is used instead of do.call(rbind, )?
First of all, rbindlist() on its own does not bind a list of sf to an
object of the class sf:
rbindlist(list_of_sf) %>% class()
Thus, subsequently the return of rbindlist() needs to be converted into an
object of the the class sf:
nc_rbindlist <- rbindlist(list_of_sf) %>% st_as_sf()
Is the return of the pipe rbindlist() %>% st_as_sf() equivalent to the one of
do.call(rbind, )?
all.equal(nc_rbindlist, nc)
No, not really! The pipe-output has inherited the class data.table:
class(nc) class(nc_rbindlist)
Will omitting the class data.table eliminate the difference?
class(nc_rbindlist) <- c("sf", "data.frame") all.equal(nc_rbindlist, nc)
There's also an issue with bbox of the geometry column:
st_bbox(nc_rbindlist) == st_bbox(nc)
Instead of returning the correct bbox for the whole geometry column,
data.table::rbindlist() simply copies the bbox of the first listed
sf object:
st_bbox(nc_rbindlist) == st_bbox(list_of_sf[[1]])
A simple trick solves
this issue inherent to sf-objects compiled with data.table::rbindlist():
sf_former_dt <- sf_former_dt[1:nrow(sf_former_dt), ] # or, preferable in a programming context: sf_former_dt <- sf_former_dt[seq_len(nrow(sf_former_dt)), ]
We apply the trick ...
nc_rbindlist <- nc_rbindlist[seq_len(nrow(nc_rbindlist)), ]
... check if it worked:
all.equal(nc_rbindlist, nc)
lists containing sf-objects with different CRS are treated quite differently
by do.call(rbind, ) and rbindlist() %>% st_as_sf(). To demonstrate that, we
create such a list:
nc <- st_read(system.file("gpkg/nc.gpkg", package = "sf"), quiet = TRUE) nc_3857 <- st_transform(nc, 3857) list_of_sf_with_unequal_crs <- list(nc, nc_3857)
rbind() does not bind sf-objects with unequal CRS and throws a corresponding
error-message:
do.call(rbind, list_of_sf_with_unequal_crs)
Whereas rbindlist() doesn't seem to be affected:
sf_dt <- list_of_sf_with_unequal_crs %>% rbindlist() %>% st_as_sf()
The returned object of the classes sf and data.table has a CRS which is the
same as the one of first listed sf-object:
st_crs(sf_dt)$epsg st_crs(sf_dt) == st_crs(nc)
If we reverse the sequence of elements of the list of sf-objects with
unequal CRS the newly first placed sf-object dictates the CRS of the by
rbindlist() %>% st_as_sf bound single sf-object:
sf_dt <- list_of_sf_with_unequal_crs %>% rev() %>% rbindlist() %>% st_as_sf() st_crs(sf_dt)$epsg st_crs(sf_dt) == st_crs(nc_3857)
The maloperation of rbindlist() regarding lists of sf-objects with
different CRS can't be handled after the fact. Thus, feeding such a list to
rbindlist() must be avoided. Hence prior checking if a list of sf contains
more than 1 CRS is crucial:
list(nc, nc_3857) %>% lapply(st_crs) %>% n_distinct()
geometry_typesA list with sf-objects having geometry columns of different geometry_types:
l_diff_geom_type <- list(nc[c(17, 56), ], st_cast(nc[4, ], "POLYGON", warn = FALSE)) sapply(l_diff_geom_type, st_geometry_type, by_geometry = FALSE) %>% unique()
rbind() can easily stack sf-objects with geometry columns of different
geometry_types and the returned geometry_type is GEOMETRY:
sf_do.call <- do.call(rbind, l_diff_geom_type) st_geometry_type(sf_do.call, by_geometry = FALSE)
But rbindlist() can't handle a list containing different geometry_types:
rbindlist(l_diff_geom_type)
In order to make such a list workable for rbindlist(), the geometry_types
have to be homogenized:
l_homogenized_geom_type <- lapply(l_diff_geom_type, st_cast, to = "GEOMETRY", warn = FALSE) sapply(l_homogenized_geom_type, st_geometry_type, by_geometry = FALSE) %>% unique()
Once the listed sf-objects have the same geometry_type, rbindlist() can
stack them:
sf_rbindlist <- rbindlist(l_homogenized_geom_type) %>% st_as_sf() st_geometry_type(sf_rbindlist, by_geometry = FALSE)
Note that the above demonstrated way of homogenizing
geometry_types isn't very time efficient for large lists of sf-objects.
Luckily there's a faster alternative:
l <- l_diff_geom_type # for the sake of code readability, copy list as short-named object for (i in seq_along(l)) { class(l[[i]][[attr(l[[i]], "sf_column")]])[[1]] <- "sfc_GEOMETRY" }
We check if the two homogenizing methods are equivalent:
all.equal(l_homogenized_geom_type, l)
all.equal( sf_rbindlist, st_as_sf(rbindlist(l)) )
sf_rbindlist <- sf_rbindlist[1:nrow(sf_rbindlist), ] class(sf_rbindlist) <- c("sf", "data.frame") row.names(sf_rbindlist) <- row.names(sf_do.call) all.equal(sf_rbindlist, sf_do.call)
The use of data.table::rbindlist() as a fast alternative to do.call(rbind, )
for binding lists of sf-objects to a single sf-object is justified in
situations when do.call(rbind, ) is too slow. It does, however, require
appropriate pre- and post-treatment:
list of sf-objects contains more than 1 distinct CRS:list_of_sf %>% lapply(st_crs) %>% n_distinct()
If so, CRS-unification needs to be done before moving on.
?st_transform ?st_crs
listed sf-objects have different geometry_types:sapply(list_of_sf, st_geometry_type, by_geometry = FALSE) %>% unique()
If so, homogenize them:
list_of_sf <- lapply(list_of_sf, st_cast, to = "GEOMETRY", warn = FALSE) # or more advisable because faster (for large lists): for (i in seq_along(list_of_sf)) { class(list_of_sf[[i]][[attr(list_of_sf[[i]], "sf_column")]])[[1]] <- "sfc_GEOMETRY" }
list of sf-objects to a single sf-object (which is also a
data.table):list_of_sf %>% rbindlist() %>% st_as_sf() -> sf_object
bbox:sf_object <- sf_object[1:nrow(sf_object), ] # or, preferable in a programming context: sf_object <- sf_object[seq_len(nrow(sf_object)), ]
data.table:class(sf_object) <- c("sf", "data.frame")
Note that those five steps are integrated in the function
sfhelpers::st_rbindlist() enabling a painless and fast conversion of a list
of sf objects to a single sf object. Moreover st_rbindlist() has argument
options capable of handling lists of sf objects that differ by the names and
positions of their geometry and other attribute columns.
?sfhelpers::st_rbindlist
bind_rows_sf for combining lists of sf objects to single dataframe
rbindlist gets wrong bbox when binding a list of sf data frames
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.