knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(gdalraster) gdal_3_6_0_min <- (gdal_version_num() >= gdal_compute_version(3, 6, 0)) gdal_3_7_0_min <- (gdal_version_num() >= gdal_compute_version(3, 7, 0))
Beginning at version 2.0, gdalraster includes bindings to a substantial subset of the GDAL Vector API. The vector bindings provide an R implementation of the GDAL Vector Data Model. Readers are encouraged to review that document for background on the underlying object model and how it relates to usage of the API in gdalraster.
The prefix OGR is used in GDAL with class names, command-line utilities, etc., that refer to vector-specific components of the library. In gdalraster, "ogr" will be used throughout the documentation and as a prefix on function names, consistent with the style of the source library. There is also a certain amount of unification of the raster and vector APIs in GDAL. In gdalraster, we will also refer to the "GDAL Vector API" and use GDALVector
as a class name. To support the vector API, gdalraster 2.0 builds on its already existing bindings to portions of GDAL's Spatial Reference System and Geometry APIs.
As an implementation of GDAL's Raster and Vector Data Models, gdalraster is not intended as a direct alternative to existing R packages such as sf and terra that primarily implement higher-level analytical data models. gdalraster is primarily focused on API-level bindings, attempting to expose most of the underlying C++ library in R. More direct access to GDAL's I/O capabilities may benefit workflows that deal with source data provided on a continuum ranging from local storage and in-memory formats to database servers and cloud systems, often involving various compression formats, SQL dialects, etc. Modularized, low-level access to GDAL/OGR functionality should be of interest to R developers creating higher-level interfaces, where the analytical data models may not have been designed with their use cases in mind. API bindings facilitate code translation and reuse which can benefit hybrid environments where GDAL is used from multiple languages. Long-term stability of bindings at the library level is desirable in production scenarios, and is a major goal for the package. It is intended that gdalraster should complement and interoperate within R's extensive and evolving spatial ecosystem.
This section describes the organization of the vector components in gdalraster, linking to package documentation for extended text on specific topics.
Core functionality for vector is provided by the exposed C++ class GDALVector
, along with a set of stand-along functions with ogr_
prefix.
GDALVector
GDALVector
is a C++ class exposed in R via Rcpp modules. It encapsulates a single OGRLayer and the GDALDataset that contains it. An object of class GDALVector
persists an open connection to the dataset, and exposes methods to:
n
features at a timeA GDALVector
object is typically generated with a call to new()
but is also returned from certain ogr_*()
functions. The documentation for ?GDALVector
gives a full list of available methods in the Usage section, along with their descriptions under Details. Methods of the class are accessed using the $
operator.
Several of the stand-alone ogr_*()
functions are grouped under the documentation topics ?ogr_manage
and ?ogr_define
. The ogr_manage
functions can be used to:
ogr_define
provides documentation and helper functions for defining feature classes. An OGR feature class definition (a.k.a. layer definition) is modeled in R as a named list containing zero or more attribute field definitions, along with one or more geometry field definitions. Specifications of the the list structures for these definitions are given in ?ogr_define
. The associated helper functions make it easy to create new layer definitions from scratch or modify an existing definition. A layer definition is convenient but not required for creating a new vector dataset, or a new layer within an existing dataset, using ogr_ds_create()
/ ogr_layer_create()
.
OGR facilities for vector geoprocessing are available in ogr_proc()
. This function can perform the following GIS overlay operations: Intersection, Union, SymDifference, Identity, Update, Clip and Erase (https://en.wikipedia.org/wiki/Vector_overlay). ogr_proc()
is basically an R port of the command-line utility ogr_layer_algebra
included in the GDAL Python bindings. In both cases, these are interfaces to library functions in the OGR C++ API.
ogrinfo()
and ogr2ogr()
, provide R wrappers of the GDAL command-line utilities ogrinfo
and ogr2ogr
. These functions support all of the command-line arguments described in the GDAL documentation, providing a powerful set of capabilities for obtaining information about an OGR-supported data source, converting vector data between file formats, and potentially editing data with SQL statements. ogr_reproject()
is a convenience wrapper around ogr2ogr()
for reprojecting vector layers with a user friendly interface.
Bindings to a subset of GDAL's Spatial Reference System API are provided by a set of stand-alone srs_*()
functions. These support conversion of spatial reference definitions in various formats to OGC Well Known Text (WKT) representation (e.g., from EPSG codes, well known names such as NAD27, NAD83, WGS84, etc., PROJ.4 definitions, PROJJSON, and others, as well as between different versions of OGC WKT, see ?srs_convert
). Functions under the ?srs_query
topic provide various information about a given SRS definition, and support testing definitions for equality.
Bindings to a significant subset of GDAL's Geometry API are implemented in a set of functions prefixed g_*()
. Many of these functions are built on the GEOS library (i.e., use GEOS via GDAL headers).
The geometry functions offer flexibility by allowing input geometries to be given as either a single WKT string, a character vector of WKT strings, a single raw
vector containing WKB, or a list of WKB raw
vectors. Output geometries are in WKB format by default, but WKT can also be requested. The convenience function g_wk2wk()
will convert its input from one to the other accordingly.
The factory functions g_create()
and g_add_geom()
support creating basic geometry types from point data (vertices), and building container geometry types from sub-geometries. Other capabilities of the geometry bindings include:
?g_query
)?g_binary_pred
)?g_binary_op
)g_buffer()
, g_simplify()
)?g_measures
)g_coords()
, g_make_valid()
, g_swap_xy()
, g_transform()
)Convenience functions for working with bounding boxes are also provided (e.g., convert to/from WKT, intersection/union, and bbox_transform()
).
Bindings to the GDAL's Virtual Systems Interface (VSI) implement standard file system operations abstracted for URLs, cloud storage services, Zip/GZip/7z/RAR, in-memory files, as well as "regular" local file systems. This provides a single interface for operating on file system objects, that works the same for any storage backend. The vsi_*()
functions have general utility not limited to operating on spatial data sources.
Class VSIFile
implements bindings to the GDAL VSIVirtualHandle API, providing analogs of Standard C file I/O functions that operate on VSI file systems (seek()
, tell()
, read()
, write()
, etc.)
Existing data management functions that operate on both raster and vector data sources include addFilesInZip()
(supporting create/append to a potentially Seek-Optimized ZIP file), deleteDataset()
, identifyDriver()
and inspectDataset()
.
Creation of new vector datasets, and schema modification in existing datasets, are performed with the ogr_*()
stand-alone functions. These are "one-off" operations that attempt to open the dataset with update access, perform modifications, and then close the dataset.
Objects of class GDALVector
are used for obtaining layer information and reading/writing feature data. A GDALVector
object provides a persistent connection to a layer and the vector dataset that contains it. That is, once instantiated, a GDALVector
object represents a live connection to the dataset until its $close()
method is called. The connection may be read-only (by default), or may be with update access allowing insert of new features, modifying exiting features and deleting features. Currently, class GDALVector
does not provide methods for modifying the layer schema.
It is not recommended to instantiate a new GDALVector
object on a layer that is already open for update access (i.e., with read_only = FALSE
). Likewise, potential modifications to a layer schema using the ogr_*()
management functions should be done prior to instantiating a GDALVector
object on the layer.
Relational database management systems (RDBMSs, e.g., GPKG / SQLite, PostgreSQL / PostGIS) generally support multiple connections including concurrent reads (e.g., multiple instances of GDALVector
reading from different layers). It is also possible to have one or more GDALVector
objects instantiated on RDBMS-based layers for read access, while another GDALVector
object performs write operations on a different layer in the same database. Database locking mechanisms for write operations are specific to the driver and underlying RDBMS (see, e.g., SQLite Configuration Options and Performance Hints).
A GDALVector
object supports traditional cursor-based traversal over the features (rows) in a layer, and optionally, column-oriented retrieval via Apache Arrow C stream interface (with GDAL >= 3.6). Data retrieval may be performed against a vector layer in full, a layer with attribute and/or spatial filters applied, or a layer defined as the result set of an SQL statement executed on the underlying data source.
OGR methods that retrieve a single feature (i.e., $getFeature()
, $getNextFeature()
) return data in a named list of fields and their values. GDALVector
also provides the $fetch()
method to retrieve a batch of n
features from the current cursor position and return them in a data frame. The $fetch()
method is an analog of DBI::dbFetch()
with essentially identical calling semantics. Specifying n = -1
or n = Inf
will retrieve all features from the beginning (honoring any spatial or attribute filters that may be in effect).
Currently, gdalraster implements only minimal S3 class interfaces for R objects containing the returned feature data. A single feature as returned by $getFeature()
/$getNextFeature()
is a list object carrying the "OGRFeature"
class attribute. A data frame returned by $fetch()
carries the "OGRFeatureSet"
class attribute. S3 methods are provided for the print()
and plot()
generics as a convenience for examining output. Otherwise, there are currently no S3 classes for geometries or geometry columns, no concept of "sticky" geometry as implemented in package sf, and no automatic propagation of coordinate reference systems. S3 interfaces may expand in future versions, but gdalraster leans toward minimalism and the use of simple, lightweight objects for holding raw data.
The $fetch()
method of a GDALVector
object returns geometries in a list column of WKB raw vectors by default. The Geometry API functions (g_*()
) also operate by default on lists of WKB raw vectors. Representation of geometries as WKB is compact and performant, and seamlessly integrates with the parsing, conversion, manipulation and plotting functions in package wk (a gdalraster dependency for vector plotting, and currently used for g_coords()
).
The $getArrowStream()
method of a GDALVector
object allows retrieving data in a column-oriented memory format. It exposes an ArrowArrayStream on a layer as a nanoarrow_array_stream
object. The nanoarrow package provides functionality to consume the array stream and import to R data structures (which are generally column oriented). nanoarrow provides helpers to facilitate zero-copy data transfer among R bindings to libraries implementing the Arrow C data interface. It is possible to pass nanoarrow objects to many functions in the arrow package. nanoarrow objects also integrate with the extension types implemented in the geoarrow package.
A data frame returned by the $fetch()
method of a GDALVector
object (i.e., an "OGRFeatureSet"
) can be converted to an sf data frame with sf::st_sf()
. A value for the crs
argument can be obtained from method $getSpatialRef()
of the GDALVector
object from which the data were read (assuming no subsequent transformation of geometries has been performed).
OGR field types are returned as the following R data types. R currently lacks a native 64-bit integer type. Support for 64-bit integer values in R is provided by the bit64 package (represented as numeric
values carrying the "integer64"
class attribute). OGR NULL values are returned as type-specific NA
(i.e., NA
, NA_integer_
, NA_integer64_
, NA_real_
, NA_character_
). When retrieving a batch of features as a data frame ("OGRFeatureSet"
), some field types will be contained in a data frame list column as indicated:
integer
valuelogical
valueinteger
(list column)numeric
value carrying the "integer64"
class attributelogical
valuebit64::integer64
(list column)numeric
valuenumeric
(list column)character
stringcharacter
strings (list column)numeric
value of class "Date"
numeric
value of class "POSIXct"
(millisecond accuracy)character
string ("HH:MM:SS"
)raw
vector (list column, NULL
entries for OGR NULL values)By default, geometries are returned as WKB raw
vectors in a data frame list column (with NULL
entries for OGR NULL geometries). The per-object setting $returnGeomAs
can also be set to one of "WKB_ISO"
, "WKT"
, "WKT_ISO"
, or "NONE"
. Omitting the geometries (e.g., by setting lyr$returnGeomAs <- "NONE"
) may be beneficial for performance and memory usage when access only to feature attributes is required.
For more information on 64-bit integer support using the bit64 package, see ?bit64::integer64
or the web version at https://bit64.r-lib.org/reference/bit64-package.html.
Parentheses around statements are used throughout the examples only as a shortcut to display the assigned values. The system.file()
command is only used to obtain the path to the package sample files; it would not be needed in typical usage.
Data sources can be files, relational database management systems, directories of many files, or even remote web services depending on the format driver being used. However, the data source name (DSN) is always a single string which might be the file path, database connection string, URL, etc.
The following examples use a GeoPackage file included in gdalraster. The file ynp_features.gpkg is compressed using SOZip and will be read directly from the .zip archive without decompressing first. A prefix is added to the file path (/vsizip/ in this case) which specifies a GDAL Virtual File System handler. A file system handler provides access to less standard file types such as in-memory, compressed (.zip, .gz, .tar, .tar.gz), encrypted, standard input and output (STDIN, STDOUT), files stored on network (publicly accessible, or in private buckets of commercial cloud services), etc.
library(gdalraster) # get path to the Yellowstone National Park sample dataset f <- system.file("extdata/ynp_features.zip", package = "gdalraster") # add the VSI prefix (zf <- file.path("/vsizip", f)) # list files in the zip archive vsi_read_dir(zf) # VSI path to the GPKG file (zf_gpkg <- file.path(zf, "ynp_features.gpkg"))
It is possible to chain multiple file system handlers. The /vsicurl/ prefix specifies a file system handler that allows on-the-fly random reading of files available through web protocols, without prior download of the entire file. For example, a zip file residing on a web server instead of the local file system could be accessed with a path like:
/vsizip//vsicurl/https://www.example.com/path/to/file.zip
Support for SOZip is available with GDAL >= 3.7. The function vsi_get_file_metadata()
can be used to validate an SOZip file and obtain compression information. Otherwise, SOZip is fully backward compatible and works as a regular .zip file.
if (gdal_version_num() >= gdal_compute_version(3, 7, 0)) { cat("SOZip metadata for ynp_features.gpkg:\n") vsi_get_file_metadata(zf_gpkg, domain = "ZIP") |> print() } else { cat("SOZip support requires GDAL >= 3.7\n") }
inspectDataset()
returns information about the format and content of a data source that may contain raster and/or vector data.
inspectDataset(zf_gpkg)
OGR functions can be also be used to inspect a vector data source.
# test for existence of a vector data source with at least read access ogr_ds_exists(zf_gpkg) # note that update of an existing zipped .gpkg file is not supported ogr_ds_exists(zf_gpkg, with_update = TRUE) # list the vector layers ogr_ds_layer_names(zf_gpkg)
ogrinfo()
requires GDAL >= 3.7. It accepts an optional character vector containing any of the arguments supported by the ogrinfo
command-line utility included with GDAL.
# list the layers in a data source ogrinfo(zf_gpkg) # detailed information about a specific layer ogrinfo(zf_gpkg, "ynp_bnd", cl_arg = c("-geom=SUMMARY", "-wkt_format", "WKT2"))
Here we copy the .gpkg file from the zip archive to an in-memory file that is writable, and make several modifications. The /vsimem/ file handler allows a block of memory to be treated as files. It is useful for temporary storage and is generally fast to access. The code below also demonstrates testing certain defined capabilities of the dataset or layer before attempting to perform modifications. This will not be needed in all cases but might be used in code that is required to handle input datasets in a general way.
# copy ynp_features.gpkg from the zip file to an in-memory file mem_gpkg <- "/vsimem/tmp/ynp_features.gpkg" ogr2ogr(zf_gpkg, mem_gpkg) vsi_read_dir("/vsimem/tmp") # confirm it's writable ogr_ds_exists(mem_gpkg, with_update = TRUE) rd_layer <- "roads" # rename a layer (requires GDAL >= 3.5) if (gdal_version_num() < gdal_compute_version(3, 5, 0)) { message("ogr_layer_rename() requires GDAL >= 3.5") } else if (ogr_layer_test_cap(mem_gpkg, rd_layer)$Rename) { ogr_layer_rename(mem_gpkg, rd_layer, "roads2") rd_layer <- "roads2" } else { message("layer does not have 'Rename' capability") } ogr_ds_layer_names(mem_gpkg) # delete a layer if (ogr_ds_test_cap(mem_gpkg)$DeleteLayer) { ogr_layer_delete(mem_gpkg, rd_layer) } else { message("dataset does not have 'DeleteLayer' capability") } ogr_ds_layer_names(mem_gpkg) # manage fields on a layer ogr_layer_field_names(mem_gpkg, "points_of_interest") # delete a field if (ogr_layer_test_cap(mem_gpkg, "points_of_interest")$DeleteField) { ogr_field_delete(mem_gpkg, "points_of_interest", "createdate") } else { message("layer does not have 'DeleteField' capability") } # rename fields if (ogr_layer_test_cap(mem_gpkg, "points_of_interest")$AlterFieldDefn) { ogr_field_rename(mem_gpkg, "points_of_interest", "poiname", "poi_name") ogr_field_rename(mem_gpkg, "points_of_interest", "poitype", "poi_type") } else { message("layer does not have 'AlterFieldDefn' capability") }
As a simple example for illustration, we populate a new field to flag geothermal features.
# create a new field if (ogr_layer_test_cap(mem_gpkg, "points_of_interest")$CreateField) { ogr_field_create(mem_gpkg, "points_of_interest", fld_name = "is_geothermal", fld_type = "OFTInteger", fld_subtype = "OFSTBoolean") } else { message("layer does not have 'CreateField' capability") } # edit data with SQL sql <- "UPDATE points_of_interest SET is_geothermal = CASE WHEN poi_type IN ('Basin', 'Geyser') THEN TRUE ELSE FALSE END" ogr_execute_sql(mem_gpkg, sql) ogr_layer_field_names(mem_gpkg, "points_of_interest")
To read back from the modified data source we need to instantiate objects of class GDALVector
. More detail on working with GDALVector
objects is given in the examples further below for data retrieval. For now, note that the function ogr_execute_sql()
returns an object of class GDALVector
for SQL statements that return a result set.
# read and display the geothermal features sql <- "SELECT poi_name, geom FROM points_of_interest WHERE is_geothermal = TRUE" (lyr <- ogr_execute_sql(mem_gpkg, sql)) lyr$getFeatureCount() # retrieve all features in the result set (cf. DBI::dbFetch()) feat_set <- lyr$fetch(-1) head(feat_set) # plot the park boundary # the layer contains a single polygon feature which is piped directly to plot() GDALVector$new(mem_gpkg, "ynp_bnd")$getNextFeature() |> plot(col = "wheat", xlab = "longitude", ylab = "latitude", main = "YNP Geothermal Features") plot(feat_set, pch = 19, col = "steelblue1", add = TRUE)
For clean up, the $close()
method should be called on the layer object opened with ogr_execute_sql()
. We also delete the temporary in-memory file.
lyr$close() # delete a data source vsi_unlink(mem_gpkg) # delete a single file # or, deleteDataset(mem_gpkg)
These examples will use the original sample dataset for YNP features. The DSN is formed with the /viszip/ prefix, and full path to the file ynp_features.gpkg contained in the zip archive. An object of class GDALVector
is generated with a call to new()
. From the examples above, we know that the GeoPackage contains three vector layers, so we also pass a layer name argument to the class constructor.
Note that if the layer argument is omitted, the constructor will attempt to open the first layer by index. The abbreviated form of the constructor would generally be used for convenience only with single-layer file formats (e.g., ESRI Shapefile, FlatGeoBuf), or with a GeoPackage file known to contain only a single layer.
A SQL SELECT statement that returns a result set can also be used for the layer
argument in the constructor.
Here we instantiate a GDALVector
object for the park boundary layer and retrieve information about it.
f <- system.file("extdata/ynp_features.zip", package = "gdalraster") ynp_dsn <- file.path("/vsizip", f, "ynp_features.gpkg") # the park boundary layer containing a single feature (bnd <- new(GDALVector, ynp_dsn, "ynp_bnd")) bnd$getFeatureCount() # spatial reference definition as WKT string bnd$getSpatialRef() # xmin, ymin, xmax, ymax bnd$bbox() # structure of the layer definition (a.k.a. feature class definition) bnd$getLayerDefn() |> str()
Traditional row-level access to feature data with OGR is cursor-based, reading one feature at a time. When starting with a new layer object, the cursor is positioned at the beginning and we can iterate through features with calls to $getNextFeature()
. This method returns an individual feature as a named list of fields and their values, and moves the cursor forward one row. It will return NULL
when no features are left.
The park boundary layer contains a single polygon feature. The list object returned for an individual feature carries the "OGRFeature"
class attribute.
bnd_feat <- bnd$getNextFeature() str(bnd_feat) # no more features bnd$getNextFeature() bnd$close()
The cursor can be reset to the beginning of the layer at any time with a call to $resetReading()
.
Class GDALVector
also provides the $fetch()
method to retrieve the next n
features from the layer and return them as a data frame (analog of DBI::dbFetch()
). $fetch()
can be used to get the entire feature set for a layer, one batch of features at a time, or the remaining features from the current cursor position. Fetching zero features is also possible to retrieve the structure of the layer as a 0-row data frame with fully typed columns. As with $getNextFeature()
, only forward paging is supported.
Passing n = -1
or n = Inf
will retrieve all features from the beginning (with an automatic call to $resetReading()
). Passing n = NA
is supported and will retrieve the remaining features from the current cursor position. Otherwise, features can be fetched progressively by passing a whole number (integer
or numeric
) as the n
argument. If more features than available are requested, the result is returned in full without warning. If fewer features than requested are returned, further fetches will return a data frame with zero rows.
Note that it is possible to set an attribute filter and/or spatial filter prior to retrieving data to restrict the returned feature set. It is also possible to set specific fields as ignored (or to specify the list of desired fields). This includes potentially omitting the geometry column when access only to feature attributes is needed. Limiting the number of fields returned may save some processing time and/or bandwidth.
The layer
argument in the constructor for GDALVector
can also be given as a SQL SELECT statement. Here we read from the roads layer selecting only the publicly accessible roads. We call $fetch()
with n = -1
to read from the beginning and retrieve all available features. The data frame returned by $fetch()
carries the "OGRFeatureSet"
class attribute providing S3 methods for print()
and plot()
.
# SQL layer for public roads sql <- "SELECT rdname, opentopubl, geom FROM roads WHERE opentopubl = 'Yes'" (roads <- new(GDALVector, ynp_dsn, sql)) roads$getFeatureCount() roads$getFieldNames() roads_featset <- roads$fetch(-1) nrow(roads_featset) head(roads_featset) plot(bnd_feat, col = "wheat", xlab = "longitude", ylab = "latitude", main = "YNP Public Roads") plot(roads_featset, col = "slategray", lwd = 2, add = TRUE) roads$close()
Fetching in batches may be advantageous when dealing with large datasets, especially when reading over a network. Also, note that $getFeatureCount()
is called internally when fetching the full feature set but not for a batch of n
features. Some vector drivers will actually scan the entire layer once to count features. The FastFeatureCount
element in the list returned by the $testCapability()
method can be checked if this might be a concern.
poi <- new(GDALVector, ynp_dsn, "points_of_interest") poi$getFeatureCount() # read progressively in batches batch_size <- 500 batch <- 0 while (TRUE) { poi_featset <- poi$fetch(batch_size) if (nrow(poi_featset) == 0) break cat("batch", batch <- batch + 1, ":", nrow(poi_featset), "features\n") # process batch # ... } poi$close()
It is also possible to retrieve features in a column-oriented memory layout using the Arrow C Stream interface in OGR (requires GDAL >= 3.6). Performance advantages can be substantial with vector format drivers that provide specialized implementations (e.g., Parquet, FlatGeoBuf, GeoPackage). Drivers with specialized implementations advertise the FastGetArrowStream
layer capability.
The $getArrowStream()
method of class GDALVector
exposes an ArrowArrayStream on the layer as a nanoarrow_array_stream
object. Note that the nanoarrow_array_stream
object has a $get_next()
method which allows accessing features in the stream by batch (returning a nanoarrow_array
object). The default batch size for an OGR array stream is 65,536 features, but can be configured on a per-object basis by assigning a value to the GDALVector
writable field $arrowStreamOptions
(as a character vector of "NAME=VALUE" pairs). nanoarrow provides S3 methods for as.data.frame()
to import a nanoarrow_array
(one batch at a time), or the nanoarrow_array_stream
itself (pulling all batches in the stream).
# Expose an ArrowArrayStream (requires GDAL >= 3.6) # re-open the roads layer with the required argument for type of access roads$open(read_only = TRUE) roads$resetReading() # does the layer have a specialized implementation roads$testCapability()$FastGetArrowStream # optionally set ArrowStream options as character vector of "NAME=VALUE", e.g., roads$arrowStreamOptions <- "INCLUDE_FID=NO" # the default batch size of 65,536 could also be configured with # MAX_FEATURES_IN_BATCH=num (stream <- roads$getArrowStream()) # get the array schema if needed schema <- stream$get_schema() # optionally read by batch, NULL is returned when no more features are available # batch <- stream$get_next() # if (!is.null(batch)) # d_batch <- as.data.frame(batch) # or, pull all the batches into a data frame d <- as.data.frame(stream) nrow(d) head(d) # release the stream when finished stream$release() # 'geom' is a list column of WKB raw vectors which can be passed to the # Geometry API functions geom_utm <- g_transform(d$geom, srs_from = roads$getSpatialRef(), srs_to = "EPSG:26912") # add a column with road lengths in meters d$rdlength <- g_length(geom_utm) head(d) roads$close()
The following example introduces another sample dataset which contains fire perimeters for Yellowstone National Park from the Monitoring Trends in Burn Severity (MTBS) project. Here we set up a temporary, writable copy of the dataset ynp_fires_1984_2022.gpkg (in memory), and use it as a data source to perform further processing.
The MTBS layer uses a projected coordinate system, while layers of the "YNP features" dataset used in the examples above are in geographic coordinates. The ogr_reproject()
function is used to project layers in geographic coordinates from ynp_features.gpkg to match the coordinate system of the MTBS layer.
# MTBS fire perimeters in Yellowstone National Park 1984-2022 f <- system.file("extdata/ynp_fires_1984_2022.gpkg", package = "gdalraster") # copy to a temporary writable file mtbs_dsn <- "/vsimem/tmp/ynp_fires_1984_2022.gpkg" ogr2ogr(f, mtbs_dsn) (fires <- new(GDALVector, mtbs_dsn, "mtbs_perims")) srs_mtsp <- fires$getSpatialRef() # Montana state plane metric definition # reproject the boundary in ynp_features.gpkg to match the MTBS layer, # returning a GDALVector object on the output layer by default (bnd <- ogr_reproject(src_dsn = ynp_dsn, src_layer = "ynp_bnd", out_dsn = mtbs_dsn, out_srs = srs_mtsp)) (bnd_feat <- bnd$getNextFeature()) bnd$close() # reproject points_of_interest poi <- ogr_reproject(ynp_dsn, "points_of_interest", mtbs_dsn, srs_mtsp) # set an attribute filter to select the Maple Fire fires$setAttributeFilter("incid_name = 'MAPLE'") fires$getFeatureCount() maple_fire <- fires$getNextFeature() # use the fire polygon as a spatial filter for points_of_interest # setSpatialFilter() expects WKT input, so convert the WKB geometry g_wk2wk(maple_fire$geom) |> poi$setSpatialFilter() poi$getFeatureCount() poi$setSelectedFields(c("poiname", "poitype", "geom")) (maple_pois <- poi$fetch(-1)) plot(bnd_feat, col = "wheat") plot(maple_fire, col = "orangered", border = NA, add = TRUE) plot(maple_pois, pch = 15, col = "royalblue", add = TRUE) fires$close() poi$close()
The $createFeature()
method of GDALVector
object creates and writes a new feature within the layer. The feature
argument is a named list of fields and their values (which could be be one row of a data frame). The passed feature is written to the layer as a new feature, rather than overwriting an existing one.
Here we add the centroid of the YNP boundary as a point of interest.
# create a feature object for the YNP centroid as a point of interest (bnd_centroid_xy <- g_centroid(bnd_feat$geom)) feat <- list() feat$poiname <- "YNP centroid" feat$poitype <- "Information" feat$createdate <- Sys.Date() feat$editdate <- Sys.Date() feat$geom <- g_create("POINT", bnd_centroid_xy) # re-open the "points_of_interest" layer with update access poi$open(read_only = FALSE) poi$testCapability()$SequentialWrite # create and write the new feature on the layer poi$createFeature(feat) # be sure pending writes are flushed poi$syncToDisk() # read back fid <- poi$getLastWriteFID() (ynp_centroid <- poi$getFeature(fid)) plot(bnd_feat, col = "wheat") plot(ynp_centroid, pch = 10, cex = 1.5, add = TRUE)
The $setFeature()
method of a GDALVector
object writes a feature based on the feature id (FID) specified in the input. As with $createFeature()
, the feature
argument is a named list of fields and their values, and must include a $FID
element referencing the existing feature to rewrite. Note that if any fields are omitted in the passed feature
the write behavior is driver-dependent:
# rewrite a feature in the "point_of_interest" layer updating the feature name # verify the layer has random write capability poi$testCapability()$RandomWrite # FID 3809 is missing the trailhead name (feat <- poi$getFeature(3809)) # update the name field and the date of the edit feat$poiname <- "Ice Lake Trailhead" feat$editdate <- Sys.Date() # rewrite the feature poi$setFeature(feat) poi$syncToDisk() (feat <- poi$getFeature(3809))
The $deleteFeature()
method of a GDALVector
object deletes the feature with the indicated feature ID if supported by the format driver. The value of fid
must be a single numeric value, optionally carrying the bit64::integer64
class attribute.
# delete the "YNP centroid" feature that was created above # verify the layer has delete feature capability poi$testCapability()$DeleteFeature # the feature ID was obtained above as: fid <- poi$getLastWriteFID() poi$getFeature(fid) poi$deleteFeature(fid) poi$syncToDisk() poi$getFeature(fid) poi$close()
The $batchCreateFeature()
method of a GDALVector
object is batch version of $createFeature()
. It creates and writes a batch of new features within the layer from input given as a data frame. Column names in the data frame must match field names of the layer and have compatible data types.
Operations that write batches of features should use transactions for better performance when writing into RDBMS drivers that have native transaction support (e.g., PostgreSQL / PostGIS, GPKG, SQLite). Grouping many features per transaction (e.g., 100,000 or more) can improve performance substantially. Using transactions also provides a mechanism to rollback a group of data modifications that fails to complete or generates errors.
The example below compares the relative performance of writing a batch of features to a GeoPackage layer with and without a transaction. We create a new POINT layer in the existing GPKG (mtbs_dsn
), and write batches of features at random point locations within the YNP bounding box.
By default, the GeoPackage driver automatically creates a column named fid
to use for the OGR feature ID (primary key in the SQLite database). This can be configured with the FID=value
layer creation option. Note that the FID is a special property of a feature and not treated as an attribute of a feature, so it is not specified in the layer definition.
# create a layer definition for random_points # the spatial ref was obtained above as: srs_mtsp <- fires$getSpatialRef() defn <- ogr_def_layer("POINT", srs = srs_mtsp) defn$pt_desc <- ogr_def_field("OFTString") defn$create_time <- ogr_def_field("OFTDateTime") ogr_layer_create(mtbs_dsn, "random_points", layer_defn = defn) lyr <- new(GDALVector, mtbs_dsn, "random_points", read_only = FALSE) bb <- g_wk2wk(bnd_feat$geom) |> bbox_from_wkt()
Next we create the first batch of features and write them without grouping in a transaction, measuring time elapsed.
batch_size <- as.integer(1e5) # create a batch of features rndX <- sample((bb[1] + 1):(bb[3] - 1), batch_size, replace = TRUE) rndY <- sample((bb[2] + 1):(bb[4] - 1), batch_size, replace = TRUE) pts <- cbind(rndX, rndY) pts_geom <- g_create("POINT", pts) d <- data.frame(pt_desc = rep("random points batch 1", batch_size), create_time = rep(Sys.time(), batch_size)) d$geom <- pts_geom # write the batch (no transaction) system.time(res <- lyr$batchCreateFeature(d)) (all(res)) lyr$syncToDisk()
A second batch of features is grouped in a transaction for writing, and the layer is checked for the expected output.
rndX <- sample((bb[1] + 1):(bb[3] - 1), batch_size, replace = TRUE) rndY <- sample((bb[2] + 1):(bb[4] - 1), batch_size, replace = TRUE) pts <- cbind(rndX, rndY) pts_geom <- g_create("POINT", pts) d <- data.frame(pt_desc = rep("random points batch 2", batch_size), create_time = rep(Sys.time(), batch_size)) d$geom <- pts_geom # write the batch using a transaction system.time({ lyr$startTransaction() res2 <- lyr$batchCreateFeature(d) if (all(res2)) lyr$commitTransaction() else lyr$rollbackTransaction() }) (all(res2)) # check the output data d_out <- lyr$fetch(-1) (nrow(d_out) == batch_size * 2) head(d_out) tail(d_out) lyr$close()
Single-layer vector file formats (e.g., ESRI shapefile, FlatGeoBuf, GeoJSON) can be created from scratch with a call to ogr_ds_create()
. This function provides optional arguments to add a single attribute field on the layer when it is created. More attribute fields could be added after layer creation with calls to ogr_field_create()
(for vector formats that have the CreateField
layer capability). Alternatively, a layer definition (as list object) can be passed to ogr_ds_create()
to specify multiple attribute fields and their properties (see ?ogr_def_layer
).
GeoJSON does not support schema definition prior to creating features. Only the Feature object has a member with name properties. The specification does not require all Feature objects in a collection to have the same schema of properties, nor does it require all Feature objects in a collection to have geometry of the same type (https://geojson.org/). Note that we set return_obj = TRUE
in the call to ogr_ds_create()
. This returns a live dataset object open for write access on the layer (i.e., an object of class GDALVector
). A valid GeoJSON file will be generated once one or more features have been written to the layer.
Here we generate a GeoJSON string containing a single polygon feature that defines a rectangular area of interest around the Maple Fire boundary. This GeoJSON format is supported for input by some data distribution systems to define a download box (e.g., NLCD and LANDFIRE). For this case we need coordinates in Web Mercator projection (EPSG 3857).
# write the Maple Fire AOI bounding box as GeoJSON in EPSG 3857 json_file <- file.path(tempdir(), "maple_fire_aoi.geojson") lyr <- ogr_ds_create("GeoJSON", json_file, layer = "maple_fire_aoi", geom_type = "POLYGON", srs = "EPSG:3857", fld_name = "id", fld_type = "OFTString", overwrite = TRUE, return_obj = TRUE) # The Maple Fire feature object and spatial reference were obtained above in # the section on "Reproject vector layers". # Here we extend the minimum bounding box by 500 m in each direction. feat <- list() feat$id <- "dataDownloadBox" feat$geom <- g_transform(maple_fire$geom, srs_from = srs_mtsp, srs_to = "EPSG:3857", as_wkb = FALSE) |> bbox_from_wkt(extend_x = 500, extend_y = 500) |> bbox_to_wkt() lyr$createFeature(feat) lyr$close() readLines(json_file) |> writeLines()
ogr_proc()
performs GIS overlay operations on vector layers. It provides an interface to GDAL API methods for the following operations: Intersection, Union, SymDifference, Identity, Update, Clip and Erase.
Inputs are given as objects of class GDALVector
, which may have spatial and/or attribute filters applied. The output layer will be created if it does not exist, but output can also be appended to an existing layer, or written to an existing empty layer that has a custom schema defined.
Here we generate a layer of areas within the 1988 North Fork fire perimeter that have subsequently re-burned (as of 2022). The output layer contains features whose geometries represent areas that are common between features in the input layer and in the method layer. The features in the output layer have attributes from both the input and method layers.
# layer filtered to fires after 1988 lyr1 <- new(GDALVector, mtbs_dsn, "mtbs_perims") lyr1$setAttributeFilter("ig_year > 1988") lyr1$getFeatureCount() # second layer for the 1988 North Fork fire perimeter sql <- "SELECT incid_name, ig_year, geom FROM mtbs_perims WHERE incid_name = 'NORTH FORK'" lyr2 <- new(GDALVector, mtbs_dsn, sql) lyr2$getFeatureCount() north_fork_feat <- lyr2$getNextFeature() # set mode options for the intersection opt <- c("INPUT_PREFIX=layer1_", "METHOD_PREFIX=layer2_", "PROMOTE_TO_MULTI=YES") # intersect to obtain areas re-burned since 2000 lyr_out <- ogr_proc(mode = "Intersection", input_lyr = lyr1, method_lyr = lyr2, out_dsn = mtbs_dsn, out_lyr_name = "north_fork_reburned", out_geom_type = "MULTIPOLYGON", mode_opt = opt) # the output layer has attributes of both the input and method layers (reburn_feat_set <- lyr_out$fetch(-1)) plot(north_fork_feat) plot(reburn_feat_set, col = "orangered", border = NA, add = TRUE, main = "1988 North Fork fire perimeter showing re-burned areas in red") # clean up lyr1$close() lyr2$close() lyr_out$close()
# clean up unlink(json_file) vsi_unlink(mtbs_dsn) vsi_rmdir("/vsimem/tmp/")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.