knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = requireNamespace("terra", quietly = TRUE) && requireNamespace("dbplyr", quietly = TRUE) )
gpkg
provides high-level wrapper functions to build Open Geospatial Consortium (OGC) 'GeoPackage' files. GDAL utilities for read and write of spatial data (vector and gridded) are provided via the {terra} package. Additional 'GeoPackage' and 'SQLite' specific functions manipulate attributes and tabular data via the {RSQLite} package.
GeoPackage is an open, standards-based, platform-independent, portable, self-describing, compact format for transferring geospatial information. The GeoPackage Encoding Standard) describes a set of conventions for storing the following within an SQLite database:
vector features
tile matrix sets of imagery and raster maps at various scales
attributes (non-spatial data)
extensions
gpkg_write()
can handle a variety of different input types. Here we start by adding two DEM (GeoTIFF) files.
library(gpkg) library(terra) dem <- system.file("extdata", "dem.tif", package = "gpkg") stopifnot(nchar(dem) > 0) gpkg_tmp <- tempfile(fileext = ".gpkg") if (file.exists(gpkg_tmp)) file.remove(gpkg_tmp) # write a gpkg with two DEMs in it gpkg_write( dem, destfile = gpkg_tmp, RASTER_TABLE = "DEM1", FIELD_NAME = "Elevation" ) gpkg_write( dem, destfile = gpkg_tmp, append = TRUE, RASTER_TABLE = "DEM2", FIELD_NAME = "Elevation", NoData = -9999 )
We can also write vector data to GeoPackage. Here we use gpkg_write()
to add a bounding box polygon layer derived from extent of "DEM1"
.
# add bounding polygon vector layer via named list r <- gpkg_tables(gpkg_tmp)[['DEM1']] v <- terra::as.polygons(r, ext = TRUE) gpkg_write(list(bbox = v), destfile = gpkg_tmp)
Similarly, data.frame
-like objects (non-spatial "attributes") can be written to GeoPackage.
z <- data.frame(a = 1:10, b = LETTERS[1:10]) gpkg_write(list(myattr = z), destfile = gpkg_tmp)
geopackage()
is a constructor that can create a simple container for working with geopackages from several types of inputs. Often you will have a character file path to a GeoPackage (.gpkg) file.
g <- geopackage(gpkg_tmp, connect = TRUE) g class(g)
Other times you may have a list of tables and layers you want to be in a GeoPackage that does not exist yet.
g2 <- geopackage(list(dem = r, bbox = v)) g2 class(g2)
Note that a temporary GeoPackage ({r, eval=exists(g2)} gpkg_source(g2)
) is automatically created when using the geopackage(<list>)
constructor.
You also may have a DBIConnection to a GeoPackage database already opened that you want to use. In any case (character, list, SQLiteConnection) there is an S3 method to facilitate creating the basic geopackage class provided by {gpkg}. All other methods are designed to be able to work smoothly with geopackage class input.
We can list the table names in a GeoPackage with gpkg_list_tables()
and fetch pointers (SpatRaster, SpatVectorProxy, and lazy data.frame) to the data in them with gpkg_table()
. We can check the status of the internal geopackage
class SQLiteConnection
with gpkg_is_connected()
and disconnect it with gpkg_disconnect()
.
# enumerate tables gpkg_list_tables(g) # inspect tables gpkg_tables(g) # inspect a specific table gpkg_table(g, "myattr", collect = TRUE)
Note that the collect = TRUE
forces data be loaded into R memory for vector and attribute data; this is the difference in result object class of SpatVectorProxy/SpatVector and tbl_SQLiteConnection/data.frame for vector and attribute data, respectively.
gpkg_collect()
is a helper method to call gpkg_table(..., collect = TRUE)
for in-memory loading of specific tables.
gpkg_collect(g, "DEM1")
Note that with grid data returned from gpkg_collect()
you get a table result with the tile contents in a blob column of a data.frame instead of SpatRaster object.
The inverse function of gpkg_collect()
is gpkg_tbl()
which always returns a tbl_SQLiteConnection.
tb <- gpkg_tbl(g, "gpkg_contents") tb
Note that with this lazy reference to table, the internal SQLiteConnection in g
is the same as the source in tb
:
gpkg_connection(g)@ptr gpkg_connection(tb)@ptr
This means that you can handle all of your connections via the connect
argument when creating a geopackage object, along with the gpkg_connect()
and gpkg_disconnect()
commands.
There are also a variety of convenience functions for the standard required tables in a geopackage.
For instance, gpkg_contents()
collects the "gpkg_contents"
table.
gpkg_contents(g)
There are several other methods that can be used for working with tabular data in a GeoPackage in a "lazy" fashion.
gpkg_table_pragma()
gpkg_table_pragma()
is a low-frills data.frame
result containing important table information, but not values. The PRAGMA table_info()
is stored as a nested data.frame table_info
. This representation has no dependencies beyond {RSQLite} and is efficient for inspection of table structure and attributes, though it is less useful for data analysis.
head(gpkg_table_pragma(g))
gpkg_vect()
and gpkg_query()
gpkg_vect()
is a wrapper around terra::vect()
you can use to create 'terra' SpatVector
objects from the tables found in a GeoPackage.
gpkg_vect(g, 'bbox')
The table of interest need not have a geometry column, but this method does not work on GeoPackage that contain only gridded data, and some layer in the GeoPackage must have some geometry.
gpkg_vect(g, 'gpkg_ogr_contents')
The SpatVectorProxy is used for "lazy" references to of vector and attribute contents of a GeoPackage; this object for vector data is analogous to the SpatRaster for gridded data. The 'terra' package provides "GDAL plumbing" for filter and query utilities.
gpkg_query()
by default uses the 'RSQLite' driver, but the richer capabilities of OGR data sources can be harnessed with SQLite SQL dialect. These additional features can be utilized with the ogr=TRUE
argument to gpkg_query()
, or gpkg_ogr_query()
for short. This assumes that GDAL is built with support for SQLite (and ideally also with support for Spatialite).
For example, we use built-in functions such as ST_MinX()
to calculate summaries for "bbox"
table, geometry column "geom"
. In this case we expect the calculated quantities to match the coordinates/boundaries of the bounding box:
res <- gpkg_ogr_query(g, "SELECT ST_MinX(geom) AS xmin, ST_MinY(geom) AS ymin, ST_MaxX(geom) AS xmax, ST_MaxY(geom) AS ymax FROM bbox") as.data.frame(res)
gpkg_rast()
Using gpkg_rast()
you can quickly access references to all tile/gridded datasets in a GeoPackage.
For example:
gpkg_rast(g)
gpkg_table()
With the gpkg_table()
method you access a specific table (by name) and get a "lazy" tibble
object referencing that table.
This is achieved via {dplyr} and the {dbplyr} database connection to the GeoPackage via the {RSQLite} driver. The resulting object's data can be used in more complex analyses by using other {dbplyr}/{tidyverse} functions.
For example, we inspect the contents of the gpkg_contents
table that contains critical information on the data contained in a GeoPackage.
gpkg_table(g, "gpkg_contents")
As a more complicated example we access the gpkg_2d_gridded_tile_ancillary
table, and perform some data processing.
We dplyr::select()
mean
and std_dev
columns from the dplyr::filter()
ed rows where tpudt_name == "DEM2"
. Finally we materialize a tibble
with dplyr::collect()
:
library(dplyr, warn.conflicts = FALSE) gpkg_table(g, "gpkg_2d_gridded_tile_ancillary") %>% filter(tpudt_name == "DEM2") %>% select(mean, std_dev) %>% collect()
Several helper methods are available for checking GeoPackage SQLiteConnection
status, as well as connecting and disconnecting an existing geopackage
object (g
).
# still connected gpkg_is_connected(g) # disconnect gpkg_disconnect(g) # reconnect gpkg_connect(g) # disconnect gpkg_disconnect(g)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.