Vapour - lightweight GDAL

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = "100%"
)

The vapour package provides access to some GDAL functionality with minimal overhead. This includes read geometry and data ('attributes') There's a function vapour_read_fields that returns the fields (attributes) as list of vectors.

pfile <- system.file("extdata", "point.shp", package = "vapour")
library(vapour)
vapour_read_fields(pfile)
mvfile <- system.file("extdata", "tab", "list_locality_postcode_meander_valley.tab", package="vapour")

dat <- as.data.frame(vapour_read_fields(mvfile),  stringsAsFactors = FALSE)

dim(dat)

head(dat)

A low-level function will return a character vector of JSON, GML, KML or WKT.

vapour_read_geometry(pfile)[5:6]  ## format = "WKB"

vapour_read_geometry_text(pfile)[5:6]  ## format = "json"

vapour_read_geometry_text(pfile, textformat = "gml")[2]

## don't do this with a non-longlat data set like cfile
vapour_read_geometry_text(pfile, textformat = "kml")[1:2]

cfile <- system.file("extdata/sst_c.gpkg", package = "vapour")
str(vapour_read_geometry_text(cfile, textformat = "wkt")[1:2])

Combine these together to get a custom data set.

dat <- as.data.frame(vapour_read_fields(cfile),  stringsAsFactors = FALSE)
dat$wkt <- vapour_read_geometry_text(cfile, textformat = "wkt")
head(dat)

There is a function vapour_read_extent to return a straightforward bounding box vector for every feature, so that we can flexibly build an index of a data set for later use.

mvfile <- system.file("extdata", "tab", "list_locality_postcode_meander_valley.tab", package="vapour")
str(head(vapour_read_extent(mvfile)))

This makes for a very lightweight summary data set that will scale to hundreds of large inputs.

There is a vapour_geom_summary() function to read only the information about each geometry.

vapour_geom_summary(mvfile)

Each function that relates to geometry includes arguments skip_n and limit_n to first specify the number of features to ignore, and second to set a maximum number of features visited. These interact, and so can be used to scan through a source. Both are applied after the sql argument.

vapour_geom_summary(mvfile, limit_n = 4)$FID

vapour_geom_summary(mvfile, skip_n = 2, limit_n = 6)$FID

vapour_geom_summary(mvfile, skip_n = 6)$FID

Each geometry function also includes an extent argument, which takes a simple vector of four values xmin, xmax, ymin, ymax or sp bbox, sf bbox, or raster extent. This is only applied if the sql argument is non empty, and corresponds to the SpatialFilter argument of ExecuteSQL.

Raster data

Find raster info.

f <- system.file("extdata", "sst.tif", package = "vapour")
vapour_raster_info(f)

Read raster data (requires explicit setting of window argument, and is not useful without being used in the context of the raster dimensions).

vapour_read_raster(f, window = c(0, 0, 6, 5))

## the final two arguments specify up- or down-sampling
## controlled by resample argument
vapour_read_raster(f, window = c(0, 0, 6, 5, 8, 9))

## if window is not included, and native TRUE then we get the entire window
str(vapour_read_raster(f, native = TRUE))

## notice this is the length of the dimXY above
prod(vapour_raster_info(f)$dimXY)

By chaining together what we know about how raster data works we can get exactly what we want from GDAL.

(Note that vapour_read_raster() returns a list of one band, new behaviour since version 0.4.0).

mm <- matrix(vapour_read_raster(f, native = TRUE)[[1]], 
      vapour_raster_info(f)$dimXY)
mm[mm < -1e6] <- NA
image(mm[,ncol(mm):1], asp = 2)

An example of using this facility interactively is in lazyraster.

OGRSQL

SQL is available for general GDAL vector data.

Note that each lower-level function accepts a sql argument, which sends a query to the GDAL library to be executed against the data source, this can create custom layers and so is independent of and ignores the layer argument. Note that the same sql statement can be passed to the geometry readers, so we get the matching sets of information. vapour_read_geometry will return NULL for each missing geometry if the statement doesn't include geometry explicitly or implicitly, but vapour_read_geometry, vapour_read_geometry_text and vapour_read_extent all explicitly modify the statement "SELECT *". (We are also assuming the data source hasn't changed between accesses ... let me know if this causes you problems!).

## note, this code assumes OGRSQL dialect
current_dialect <- Sys.getenv("vapour.sql.dialect")
Sys.unsetenv("vapour.sql.dialect")  ## ensure that default dialect is used 
## (in this case it's 'OGRSQL' but we only want to record the state and reset after)
## later on, when done do Sys.setenv(vapour.sql.dialect = current_dialect)


vapour_read_fields(mvfile, sql = "SELECT NAME, PLAN_REF FROM list_locality_postcode_meander_valley WHERE POSTCODE < 7291")
vapour_read_fields(mvfile, sql = "SELECT NAME, PLAN_REF, FID FROM list_locality_postcode_meander_valley WHERE POSTCODE = 7306")

Also note that FID is a special row number value, to be used a as general facility for selecting by structural row. This FID is driver-dependent, it can be 0- or 1-based, or completely arbitrary.

Variously, drivers (GDAL's formats) are 0- or 1- based with the FID. Others (such as OSM) are arbitrary, and have non-sequential (and presumably persistent) FID values.

library(vapour)
file0 <- "list_locality_postcode_meander_valley.tab"
mvfile <- system.file("extdata/tab", file0, package="vapour")
layer <- gsub(".tab$", "", basename(mvfile))
## get the number of features by FID (DISTINCT should be redundant here)
vapour_read_fields(mvfile, sql = sprintf("SELECT COUNT(DISTINCT FID) AS nfeatures FROM %s", layer))

## note how TAB is 1-based 
vapour_read_fields(mvfile, sql = sprintf("SELECT COUNT(*) AS n FROM %s WHERE FID < 2", layer))

## but SHP is 0-based
shp <- system.file("extdata/point.shp", package="vapour")
vapour_read_fields(shp, sql = sprintf("SELECT COUNT(*) AS n FROM %s WHERE FID < 2", "point"))

See https://gdal.org/user/ogr_sql_dialect.html

Now we are done with our SQL, so in case something else was using current_dialect, restore it.

Sys.setenv(vapour.sql.dialect = current_dialect)

There are many useful higher level operations that can be used with this. The simplest is the ability to use GDAL as a database-like connection to attribute tables.

SQL Dialect

In earlier versions (pre 0.9.1) we couldn't control the SQL dialect in use, so we variously had available 'OGRSQL' or 'SQLITE', or whatever native dialect the format defaults to. We couldn't use 'OGRSQL' with a Geopackage, and neither could we use 'SQLITE' with a shapefile, MapInfo TAB, or ESRI file Geodatabase (these last three aren't "real" databases so don't have their own native SQL, and GDAL fills the gap with OGRSQL. Geopackage defaults to SQLITE, because that's what it is built upon.

So for example.

## good citizenry
current_dialect <- Sys.getenv("vapour.sql.dialect")
Sys.setenv(vapour.sql.dialect = "SQLITE")

## now we can use SQLITE dialect with TAB
vapour_read_fields(mvfile, sql = sprintf("SELECT st_area(GEOMETRY) AS area FROM %s LIMIT 1 ", layer))

## but with OGRSQL we need
Sys.setenv(vapour.sql.dialect = "OGRSQL")
vapour_read_fields(mvfile, sql = sprintf("SELECT OGR_GEOM_AREA AS area FROM %s LIMIT 1 ", layer))

GDAL information

Find the GDAL version and drivers available.

vapour_gdal_version()

str(vapour_all_drivers())

Find the driver that will be used for a given data source.

vapour_driver(mvfile)


Try the vapour package in your browser

Any scripts or data that you put into this service are public.

vapour documentation built on April 11, 2023, 5:59 p.m.