knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
library(dplyr)
library(sf)

lazysf

CRAN status

The goal of lazysf is to provide interactive delayed read of GDAL vector data sources.

Note: very soon a release of gdalraster will supersede what lazysf can do and we recommend using that instead.

Vector data sources, drawings (a.k.a. "shapefiles") are files or web services or databases that provide tables of data fields. These fields may include spatial geometry data such as points, lines, polygons, and other planar types composed of paths of coordinates.

lazysf uses the dplyr/dbplyr 'tbl_lazy' mechanism by providing a GDAL DBI-backend like many database packages in R. The convenience function lazysf() provides a single-argument wrapper around the database-like workflows.

See it in action!

library(lazysf)
library(sf)
library(dplyr)

url <- "https://github.com/Nowosad/spData/raw/master/inst/shapes/NY8_bna_utm18.gpkg"
(x <- lazysf(url))
x %>% distinct(AREANAME) %>% arrange(AREANAME) 

plot(st_as_sf(x %>% 
                dplyr::filter(!(AREANAME %LIKE% "Ca%" | AREANAME %LIKE% "Bi%")) %>% 
                dplyr::select(AREANAME, geom)))

Limitations

This is very largely format dependent, and by "format" we mean the driver as provided by GDAL.

We make no claims about performance or convenience, it will be affected by your system and your sf installation - lazysf just takes you closer the GDAL capabilities.

Performance can be excellent, and may be very competitive compared to reading an entire data source layer into memory. Really good drivers include ESRI Shapefile, Geopackage, PostgreSQL/PostGIS, MapInfo File, ESRI FileGDB, but there are dozens to choose from.

A query on a CSV, GeoJSON, or KML file (local or remote) is entirely subject to the performance of the matching GDAL driver.

Real DBs don't have these special OGRSQL features, but they do have their own special syntax which for the most part can be sent straight through.

When using dplyr verbs (filter(), select(), mutate(), transmute(), arrange(), left_join(), ...) we are also subject to the rules of SQL translation. There are no specific ones provided by lazysf but that might change.

Wrappers around lazysf could provide more specific tools for particular formats.

Can't we just do this with sf's query argument?

Yes (actually that is what lazysf uses) but with sf alone you get a fully materialized sf data frame, so you better get that query right first time!

With lazysf you get some control over intermediate steps, potentially expensive queries will only be run for a preview of the data until you are ready to fetch it.

Installation

You can install the dev version of lazysf from GitHub with:

# Enable this universe
options(repos = c(
    hypertidy = 'https://hypertidy.r-universe.dev',
    CRAN = 'https://cloud.r-project.org'))

# Install some packages
install.packages('lazysf')

Example

This is a basic example.

```{R basic} library(lazysf) f <- system.file("gpkg/nc.gpkg", package = "sf", mustWork = TRUE)

specify only the data source

lazysf(f)

specify the data source and a query to run

lazysf(f, query = "SELECT AREA, FIPS, geom FROM \"nc.gpkg\" WHERE AREA < 0.1")

specify the data source and the table/layer to access

lazysf(f, layer = "nc.gpkg") %>% dplyr::select(AREA, FIPS, geom) %>% dplyr::filter(AREA < 0.1)

above was a real database (Geopackage), now with an actual shapefile

shp <- lazysf(system.file("shape/nc.shp", package = "sf", mustWork = TRUE)) library(dplyr) shp %>% filter(NAME %LIKE% 'A%') %>% mutate(abc = 1.3) %>% select(abc, NAME, _ogr_geometry_) %>% arrange(desc(NAME)) #%>% show_query()

Online sources can also work if your build of sf supports. 

```r
# online sources can work
geojson <- file.path("https://raw.githubusercontent.com/SymbolixAU",
                      "geojsonsf/master/inst/examples/geo_melbourne.geojson")
lazysf(geojson)

Also works on PostgreSQL and many others as per GDAL vector driver support.

To create a connection string for GDAL for PostgreSQL use something like

DSN <- glue::glue("PG:host='{host}' dbname='{dbname}' user='{user}' password='{password}'")

dbConnect(SFSQL(), DSN)

but the same can be done with generic DBI and (for example) the Rpostgres package. With SFSQL() we just know that it's executed by GDAL (via sf).

Some drivers are related

Note that GDAL drivers can be confusing, and it can be important to see the behaviours GDAL will provide by default. Here see that we can read from a Geopackage file but not as it was intended. We have used the driver-prefix to make GDAL choose its SQLite driver rather than the Geopackage driver.

library(lazysf)
gpkgfile <- system.file("gpkg/nc.gpkg", package = "sf", mustWork = TRUE)
lazysf(glue::glue("SQLite:{gpkgfile}"))

That's not very spatial, but we can dig in to find out what else is there.


Code of Conduct

Please note that the lazysf project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.



hypertidy/lazysf documentation built on April 17, 2025, 1:56 p.m.