README.md

sfcc

The goal of sfcc is to provide fast and flexible creation of sf simple features geometries.

Originally for sfc vectors of POINT, directly motivated by the set-based idioms of silicate, and the need to bypass the thorough checking done by the sf constructors.

The design for the code is this:

For example

For MULTIPOLYGON

What might also work is to have a single POLYGON constructor, that optionally applies the sfg level classing. Then MULTIPOLYGON wrapper can call that for each of its sub objects, with "no class please" and then apply its own class at the right level. But, these things are pretty simple so a bit of duplication is also ok.

It's not finalize what the best way to specify grouping is, but it's driven from silicate. (There's hopefully a flexible set of different ways to do this, including aes(x, y, group = ..., subobject = ...) and other basics like earcut.hpp, silicate objects, other types).

For now we don't consider GEOMETRY (sfc vectors with mixed topology).

Installation

I wouldn't use this as-is, yet. :)

Example

This example was motivated by this issue, trying to speed up sf creation for POINTs.

The improvement is fairly modest, so there might be a better Rcpp way to do this?

N <- 1e5
x <- data.frame(a = sample(letters, N, replace = TRUE),
                 lng = runif(N, -120, -100),
                 lat = runif(N, 30, 48))
## we avoid st_sf so we can compare apples (just the sfc creation)
sf_mk_points <- function(x, coords) {
        classdim = sf:::getClassDim(rep(0, length(coords)), length(coords), dim, "POINT")
        structure( lapply(split(as.vector(t(as.matrix(x[, coords]))),
                rep(seq_len(nrow(x)), each = length(coords))),
                ## it does help to create this as a template and update in this loop
                ## but not as much as list-creation in C++
                function(vec) structure(vec, class = classdim)),
            n_empty = 0L, precision = 0, crs = NA_crs_,
            bbox = structure(
                c(xmin = min(x[[coords[1]]], na.rm = TRUE),
                ymin = min(x[[coords[2]]], na.rm = TRUE),
                xmax = max(x[[coords[1]]], na.rm = TRUE),
                ymax = max(x[[coords[2]]], na.rm = TRUE)), class = "bbox"),
            class =  c("sfc_POINT", "sfc" ))
}

m <- cbind(x$lng, x$lat)
library(rbenchmark)

library(sf)
#> Linking to GEOS 3.6.2, GDAL 2.2.4, proj.4 4.9.3
benchmark(
  sfcc_pt =  sfcc::mk_sfc_POINT(m, crs = 4326),
  sf_pt = sf_mk_points(x, coords = c("lng", "lat"))
  ,replications = 10
)
#>      test replications elapsed relative user.self sys.self user.child
#> 2   sf_pt           10  16.844   17.787    16.815    0.016          0
#> 1 sfcc_pt           10   0.947    1.000     0.935    0.012          0
#>   sys.child
#> 2         0
#> 1         0

A polygon example (motivated by spex::polygonize)

qm <- quadmesh::quadmesh(raster::raster(volcano), z = NULL)
## a dummy structure to copy
template <- structure(list(cbind(1:5, 0)),
                      class = c("XY", "POLYGON", "sfg"))

nquads <- ncol(qm$ib)

idx <- rep(seq(0, nquads), each = 5L) + c(1L, 2L, 3L, 4L, 1L)
quadgroups <- rep(seq_len(nquads), each = 5)
idx <- rbind(qm$ib, qm$ib[1,])
xylist <- split(c(t(qm$vb[1:2, idx])), rep(quadgroups, 2L))


## slower code

system.time({
 l <- lapply(seq_along(xylist), function(ii) {
   template[[1L]][] <- xylist[[ii]]
   template
 })
})
#>    user  system elapsed 
#>   0.051   0.000   0.051


## Rcpp code is a bit faster

library(sfcc)
## we were working with a flat vector
xylistm <- lapply(xylist, function(v) matrix(v, ncol = 2))
system.time(p <-  polygons_rcpp(xylistm))
#>    user  system elapsed 
#>   0.012   0.000   0.012
identical(l, p)
#> [1] TRUE


mdsumner/sfcc documentation built on May 12, 2019, 6:23 p.m.