knitr::opts_chunk$set( collapse = TRUE, comment = "#>", out.width = "100%" )
The vapour package provides access to some GDAL functionality with minimal overhead. This includes read
geometry and data ('attributes')
There's a function vapour_read_fields
that returns the fields (attributes) as list of vectors.
pfile <- system.file("extdata", "point.shp", package = "vapour") library(vapour) vapour_read_fields(pfile)
mvfile <- system.file("extdata", "tab", "list_locality_postcode_meander_valley.tab", package="vapour") dat <- as.data.frame(vapour_read_fields(mvfile), stringsAsFactors = FALSE) dim(dat) head(dat)
A low-level function will return a character vector of JSON, GML, KML or WKT.
vapour_read_geometry(pfile)[5:6] ## format = "WKB" vapour_read_geometry_text(pfile)[5:6] ## format = "json" vapour_read_geometry_text(pfile, textformat = "gml")[2] ## don't do this with a non-longlat data set like cfile vapour_read_geometry_text(pfile, textformat = "kml")[1:2] cfile <- system.file("extdata/sst_c.fgb", package = "vapour") str(vapour_read_geometry_text(cfile, textformat = "wkt")[1:2])
Combine these together to get a custom data set.
dat <- as.data.frame(vapour_read_fields(cfile), stringsAsFactors = FALSE) dat$wkt <- vapour_read_geometry_text(cfile, textformat = "wkt") head(dat)
There is a function vapour_read_extent
to return a straightforward bounding box vector for every feature, so that
we can flexibly build an index of a data set for later use.
mvfile <- system.file("extdata", "tab", "list_locality_postcode_meander_valley.tab", package="vapour") str(head(vapour_read_extent(mvfile)))
This makes for a very lightweight summary data set that will scale to hundreds of large inputs.
There is a vapour_geom_summary()
function to read only the information about each geometry.
vapour_geom_summary(mvfile)
Each function that relates to geometry includes arguments skip_n
and limit_n
to first specify the number of features to ignore, and second to set a maximum number of features visited. These interact, and so can be used to scan through a source. Both are applied after the sql
argument.
vapour_geom_summary(mvfile, limit_n = 4)$FID vapour_geom_summary(mvfile, skip_n = 2, limit_n = 6)$FID vapour_geom_summary(mvfile, skip_n = 6)$FID
Each geometry function also includes an extent
argument, which takes a simple vector of four values xmin, xmax, ymin, ymax
or sp bbox, sf bbox, or raster extent. This is only applied if the sql argument is non empty, and corresponds to the SpatialFilter argument of ExecuteSQL.
Find raster info.
f <- system.file("extdata", "sst.tif", package = "vapour") vapour_raster_info(f)
Read raster data (requires explicit setting of window
argument, and is not useful
without being used in the context of the raster dimensions).
vapour_read_raster(f, window = c(0, 0, 6, 5)) ## the final two arguments specify up- or down-sampling ## controlled by resample argument vapour_read_raster(f, window = c(0, 0, 6, 5, 8, 9)) ## if window is not included, and native TRUE then we get the entire window str(vapour_read_raster(f, native = TRUE)) ## notice this is the length of the dimXY above prod(vapour_raster_info(f)$dimXY)
By chaining together what we know about how raster data works we can get exactly what we want from GDAL.
(Note that vapour_read_raster()
returns a list of one band, new behaviour since version 0.4.0).
mm <- matrix(vapour_read_raster(f, native = TRUE)[[1]], vapour_raster_info(f)$dimXY) mm[mm < -1e6] <- NA image(mm[,ncol(mm):1], asp = 2)
An example of using this facility interactively is in lazyraster.
SQL is available for general GDAL vector data.
Note that each lower-level function accepts a sql
argument,
which sends a query to the GDAL library to be executed against
the data source, this can create custom layers and so is independent of and ignores the layer
argument. Note that the same sql statement can be passed to the geometry readers, so we get the matching sets of information. vapour_read_geometry
will return NULL for each missing geometry if the statement doesn't include geometry explicitly or implicitly, but vapour_read_geometry
, vapour_read_geometry_text
and vapour_read_extent
all explicitly modify the statement "SELECT *". (We are also assuming the data source hasn't changed between accesses ... let me know if this causes you problems!).
## note, this code assumes OGRSQL dialect current_dialect <- Sys.getenv("vapour.sql.dialect") Sys.unsetenv("vapour.sql.dialect") ## ensure that default dialect is used ## (in this case it's 'OGRSQL' but we only want to record the state and reset after) ## later on, when done do Sys.setenv(vapour.sql.dialect = current_dialect) vapour_read_fields(mvfile, sql = "SELECT NAME, PLAN_REF FROM list_locality_postcode_meander_valley WHERE POSTCODE < 7291") vapour_read_fields(mvfile, sql = "SELECT NAME, PLAN_REF, FID FROM list_locality_postcode_meander_valley WHERE POSTCODE = 7306")
Also note that FID is a special row number value, to be used a as general facility for selecting by structural row. This FID is driver-dependent, it can be 0- or 1-based, or completely arbitrary.
Variously, drivers (GDAL's formats) are 0- or 1- based with the FID. Others (such as OSM) are arbitrary, and have non-sequential (and presumably persistent) FID values.
library(vapour) file0 <- "list_locality_postcode_meander_valley.tab" mvfile <- system.file("extdata/tab", file0, package="vapour") layer <- gsub(".tab$", "", basename(mvfile)) ## get the number of features by FID (DISTINCT should be redundant here) vapour_read_fields(mvfile, sql = sprintf("SELECT COUNT(DISTINCT FID) AS nfeatures FROM %s", layer)) ## note how TAB is 1-based vapour_read_fields(mvfile, sql = sprintf("SELECT COUNT(*) AS n FROM %s WHERE FID < 2", layer)) ## but SHP is 0-based shp <- system.file("extdata/point.shp", package="vapour") vapour_read_fields(shp, sql = sprintf("SELECT COUNT(*) AS n FROM %s WHERE FID < 2", "point"))
See https://gdal.org/user/ogr_sql_dialect.html
Now we are done with our SQL, so in case something else was using current_dialect
, restore it.
Sys.setenv(vapour.sql.dialect = current_dialect)
There are many useful higher level operations that can be used with this. The simplest is the ability to use GDAL as a database-like connection to attribute tables.
In earlier versions (pre 0.9.1) we couldn't control the SQL dialect in use, so we variously had available 'OGRSQL' or 'SQLITE', or whatever native dialect the format defaults to. We couldn't use 'OGRSQL' with a Geopackage, and neither could we use 'SQLITE' with a shapefile, MapInfo TAB, or ESRI file Geodatabase (these last three aren't "real" databases so don't have their own native SQL, and GDAL fills the gap with OGRSQL. Geopackage defaults to SQLITE, because that's what it is built upon.
So for example.
## good citizenry current_dialect <- Sys.getenv("vapour.sql.dialect") Sys.setenv(vapour.sql.dialect = "SQLITE") ## now we can use SQLITE dialect with TAB vapour_read_fields(mvfile, sql = sprintf("SELECT st_area(GEOMETRY) AS area FROM %s LIMIT 1 ", layer)) ## but with OGRSQL we need Sys.setenv(vapour.sql.dialect = "OGRSQL") vapour_read_fields(mvfile, sql = sprintf("SELECT OGR_GEOM_AREA AS area FROM %s LIMIT 1 ", layer))
Find the GDAL version and drivers available.
vapour_gdal_version() str(vapour_all_drivers())
Find the driver that will be used for a given data source.
vapour_driver(mvfile)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.