knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" ) library(dplyr) library(sf)
The goal of lazysf is to provide interactive delayed read of GDAL vector data sources.
Note: very soon a release of gdalraster will supersede what lazysf can do and we recommend using that instead.
Vector data sources, drawings (a.k.a. "shapefiles") are files or web services or databases that provide tables of data fields. These fields may include spatial geometry data such as points, lines, polygons, and other planar types composed of paths of coordinates.
lazysf uses the dplyr/dbplyr 'tbl_lazy' mechanism by providing a GDAL DBI-backend like many database packages in R. The convenience function lazysf()
provides a single-argument wrapper around the database-like workflows.
See it in action!
library(lazysf) library(sf) library(dplyr) url <- "https://github.com/Nowosad/spData/raw/master/inst/shapes/NY8_bna_utm18.gpkg" (x <- lazysf(url)) x %>% distinct(AREANAME) %>% arrange(AREANAME) plot(st_as_sf(x %>% dplyr::filter(!(AREANAME %LIKE% "Ca%" | AREANAME %LIKE% "Bi%")) %>% dplyr::select(AREANAME, geom)))
This is very largely format dependent, and by "format" we mean the driver as provided by GDAL.
We make no claims about performance or convenience, it will be affected by your system and your sf installation - lazysf just takes you closer the GDAL capabilities.
Performance can be excellent, and may be very competitive compared to reading an entire data source layer into memory. Really good drivers include ESRI Shapefile, Geopackage, PostgreSQL/PostGIS, MapInfo File, ESRI FileGDB, but there are dozens to choose from.
A query on a CSV, GeoJSON, or KML file (local or remote) is entirely subject to the performance of the matching GDAL driver.
_ogr_geometry_
, other non-DB formats like
ESRI's geodatabase have other names like SHAPE
Real DBs don't have these special OGRSQL features, but they do have their own special syntax which for the most part can be sent straight through.
When using dplyr verbs (filter()
, select()
, mutate()
, transmute()
,
arrange()
, left_join()
, ...) we are also subject to the rules of SQL
translation. There are no specific ones provided by lazysf but that might
change.
Wrappers around lazysf could provide more specific tools for particular formats.
Yes (actually that is what lazysf uses) but with sf alone you get a fully materialized sf data frame, so you better get that query right first time!
With lazysf you get some control over intermediate steps, potentially expensive queries will only be run for a preview of the data until you are ready to fetch it.
You can install the dev version of lazysf from GitHub with:
# Enable this universe options(repos = c( hypertidy = 'https://hypertidy.r-universe.dev', CRAN = 'https://cloud.r-project.org')) # Install some packages install.packages('lazysf')
This is a basic example.
```{R basic} library(lazysf) f <- system.file("gpkg/nc.gpkg", package = "sf", mustWork = TRUE)
lazysf(f)
lazysf(f, query = "SELECT AREA, FIPS, geom FROM \"nc.gpkg\" WHERE AREA < 0.1")
lazysf(f, layer = "nc.gpkg") %>% dplyr::select(AREA, FIPS, geom) %>% dplyr::filter(AREA < 0.1)
shp <- lazysf(system.file("shape/nc.shp", package = "sf", mustWork = TRUE))
library(dplyr)
shp %>%
filter(NAME %LIKE% 'A%') %>%
mutate(abc = 1.3) %>%
select(abc, NAME, _ogr_geometry_
) %>%
arrange(desc(NAME)) #%>% show_query()
Online sources can also work if your build of sf supports. ```r # online sources can work geojson <- file.path("https://raw.githubusercontent.com/SymbolixAU", "geojsonsf/master/inst/examples/geo_melbourne.geojson") lazysf(geojson)
Also works on PostgreSQL and many others as per GDAL vector driver support.
To create a connection string for GDAL for PostgreSQL use something like
DSN <- glue::glue("PG:host='{host}' dbname='{dbname}' user='{user}' password='{password}'") dbConnect(SFSQL(), DSN)
but the same can be done with generic DBI and (for example) the Rpostgres package. With SFSQL()
we just know that it's executed by GDAL (via sf).
Note that GDAL drivers can be confusing, and it can be important to see the behaviours GDAL will provide by default. Here see that we can read from a Geopackage file but not as it was intended. We have used the driver-prefix to make GDAL choose its SQLite driver rather than the Geopackage driver.
library(lazysf) gpkgfile <- system.file("gpkg/nc.gpkg", package = "sf", mustWork = TRUE) lazysf(glue::glue("SQLite:{gpkgfile}"))
That's not very spatial, but we can dig in to find out what else is there.
Please note that the lazysf project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.