Vector API Overview
In gdalraster: Bindings to 'GDAL'

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(gdalraster)
gdal_3_6_0_min <- (gdal_version_num() >= gdal_compute_version(3, 6, 0))
gdal_3_7_0_min <- (gdal_version_num() >= gdal_compute_version(3, 7, 0))

Background

Beginning at version 2.0, gdalraster includes bindings to a substantial subset of the GDAL Vector API. The vector bindings provide an R implementation of the GDAL Vector Data Model. Readers are encouraged to review that document for background on the underlying object model and how it relates to usage of the API in gdalraster.

The prefix OGR is used in GDAL with class names, command-line utilities, etc., that refer to vector-specific components of the library. In gdalraster, "ogr" will be used throughout the documentation and as a prefix on function names, consistent with the style of the source library. There is also a certain amount of unification of the raster and vector APIs in GDAL. In gdalraster, we will also refer to the "GDAL Vector API" and use GDALVector as a class name. To support the vector API, gdalraster 2.0 builds on its already existing bindings to portions of GDAL's Spatial Reference System and Geometry APIs.

Rationale

As an implementation of GDAL's Raster and Vector Data Models, gdalraster is not intended as a direct alternative to existing R packages such as sf and terra that primarily implement higher-level analytical data models. gdalraster is primarily focused on API-level bindings, attempting to expose most of the underlying C++ library in R. More direct access to GDAL's I/O capabilities may benefit workflows that deal with source data provided on a continuum ranging from local storage and in-memory formats to database servers and cloud systems, often involving various compression formats, SQL dialects, etc. Modularized, low-level access to GDAL/OGR functionality should be of interest to R developers creating higher-level interfaces, where the analytical data models may not have been designed with their use cases in mind. API bindings facilitate code translation and reuse which can benefit hybrid environments where GDAL is used from multiple languages. Long-term stability of bindings at the library level is desirable in production scenarios, and is a major goal for the package. It is intended that gdalraster should complement and interoperate within R's extensive and evolving spatial ecosystem.

Vector API components

This section describes the organization of the vector components in gdalraster, linking to package documentation for extended text on specific topics.

Core bindings

Core functionality for vector is provided by the exposed C++ class GDALVector, along with a set of stand-along functions with ogr_ prefix.

Class `GDALVector`

GDALVector is a C++ class exposed in R via Rcpp modules. It encapsulates a single OGRLayer and the GDALDataset that contains it. An object of class GDALVector persists an open connection to the dataset, and exposes methods to:

retrieve layer information
set attribute and spatial filters
set ignored fields/selected fields
traverse and read feature data by traditional row-based cursor
fetch the full feature set and return as a data frame
fetch progressively in batches of n features at a time
read via column-oriented ArrowArrayStream
write new features in a layer
upsert
edit/overwrite existing features
delete
perform operations within transactions

A GDALVector object is typically generated with a call to new() but is also returned from certain ogr_*() functions. The documentation for ?GDALVector gives a full list of available methods in the Usage section, along with their descriptions under Details. Methods of the class are accessed using the $ operator.

OGR stand-alone functions

Several of the stand-alone ogr_*() functions are grouped under the documentation topics ?ogr_manage and ?ogr_define. The ogr_manage functions can be used to:

create new vector datasets from scratch
test for existence of datasets, layers and fields
test dataset and layer capabilities
create new layers in an existing dataset
delete layers
create new attribute and geometry fields on an existing layer
rename and delete fields
edit data with SQL statements

ogr_define provides documentation and helper functions for defining feature classes. An OGR feature class definition (a.k.a. layer definition) is modeled in R as a named list containing zero or more attribute field definitions, along with one or more geometry field definitions. Specifications of the the list structures for these definitions are given in ?ogr_define. The associated helper functions make it easy to create new layer definitions from scratch or modify an existing definition. A layer definition is convenient but not required for creating a new vector dataset, or a new layer within an existing dataset, using ogr_ds_create() / ogr_layer_create().

OGR facilities for vector geoprocessing are available in ogr_proc(). This function can perform the following GIS overlay operations: Intersection, Union, SymDifference, Identity, Update, Clip and Erase (https://en.wikipedia.org/wiki/Vector_overlay). ogr_proc() is basically an R port of the command-line utility ogr_layer_algebra included in the GDAL Python bindings. In both cases, these are interfaces to library functions in the OGR C++ API.

ogrinfo() and ogr2ogr(), provide R wrappers of the GDAL command-line utilities ogrinfo and ogr2ogr. These functions support all of the command-line arguments described in the GDAL documentation, providing a powerful set of capabilities for obtaining information about an OGR-supported data source, converting vector data between file formats, and potentially editing data with SQL statements. ogr_reproject() is a convenience wrapper around ogr2ogr() for reprojecting vector layers with a user friendly interface.

Supporting APIs

Spatial Reference Systems

Bindings to a subset of GDAL's Spatial Reference System API are provided by a set of stand-alone srs_*() functions. These support conversion of spatial reference definitions in various formats to OGC Well Known Text (WKT) representation (e.g., from EPSG codes, well known names such as NAD27, NAD83, WGS84, etc., PROJ.4 definitions, PROJJSON, and others, as well as between different versions of OGC WKT, see ?srs_convert). Functions under the ?srs_query topic provide various information about a given SRS definition, and support testing definitions for equality.

Geometry

Bindings to a significant subset of GDAL's Geometry API are implemented in a set of functions prefixed g_*(). Many of these functions are built on the GEOS library (i.e., use GEOS via GDAL headers).

The geometry functions offer flexibility by allowing input geometries to be given as either a single WKT string, a character vector of WKT strings, a single raw vector containing WKB, or a list of WKB raw vectors. Output geometries are in WKB format by default, but WKT can also be requested. The convenience function g_wk2wk() will convert its input from one to the other accordingly.

The factory functions g_create() and g_add_geom() support creating basic geometry types from point data (vertices), and building container geometry types from sub-geometries. Other capabilities of the geometry bindings include:

inspection (see ?g_query)
binary predicates (see ?g_binary_pred)
binary operations (see ?g_binary_op)
unary operations (e.g., g_buffer(), g_simplify())
measurements (see ?g_measures)
miscellaneous utilities (e.g., g_coords(), g_make_valid(), g_swap_xy(), g_transform())

Convenience functions for working with bounding boxes are also provided (e.g., convert to/from WKT, intersection/union, and bbox_transform()).

Virtual Systems Interface

Bindings to the GDAL's Virtual Systems Interface (VSI) implement standard file system operations abstracted for URLs, cloud storage services, Zip/GZip/7z/RAR, in-memory files, as well as "regular" local file systems. This provides a single interface for operating on file system objects, that works the same for any storage backend. The vsi_*() functions have general utility not limited to operating on spatial data sources.

Class VSIFile implements bindings to the GDAL VSIVirtualHandle API, providing analogs of Standard C file I/O functions that operate on VSI file systems (seek(), tell(), read(), write(), etc.)

General data management

Existing data management functions that operate on both raster and vector data sources include addFilesInZip() (supporting create/append to a potentially Seek-Optimized ZIP file), deleteDataset(), identifyDriver() and inspectDataset().

Workflow considerations

Creation of new vector datasets, and schema modification in existing datasets, are performed with the ogr_*() stand-alone functions. These are "one-off" operations that attempt to open the dataset with update access, perform modifications, and then close the dataset.

Objects of class GDALVector are used for obtaining layer information and reading/writing feature data. A GDALVector object provides a persistent connection to a layer and the vector dataset that contains it. That is, once instantiated, a GDALVector object represents a live connection to the dataset until its $close() method is called. The connection may be read-only (by default), or may be with update access allowing insert of new features, modifying exiting features and deleting features. Currently, class GDALVector does not provide methods for modifying the layer schema.

It is not recommended to instantiate a new GDALVector object on a layer that is already open for update access (i.e., with read_only = FALSE). Likewise, potential modifications to a layer schema using the ogr_*() management functions should be done prior to instantiating a GDALVector object on the layer.

Relational database management systems (RDBMSs, e.g., GPKG / SQLite, PostgreSQL / PostGIS) generally support multiple connections including concurrent reads (e.g., multiple instances of GDALVector reading from different layers). It is also possible to have one or more GDALVector objects instantiated on RDBMS-based layers for read access, while another GDALVector object performs write operations on a different layer in the same database. Database locking mechanisms for write operations are specific to the driver and underlying RDBMS (see, e.g., SQLite Configuration Options and Performance Hints).

Data retrieval

A GDALVector object supports traditional cursor-based traversal over the features (rows) in a layer, and optionally, column-oriented retrieval via Apache Arrow C stream interface (with GDAL >= 3.6). Data retrieval may be performed against a vector layer in full, a layer with attribute and/or spatial filters applied, or a layer defined as the result set of an SQL statement executed on the underlying data source.

OGR methods that retrieve a single feature (i.e., $getFeature(), $getNextFeature()) return data in a named list of fields and their values. GDALVector also provides the $fetch() method to retrieve a batch of n features from the current cursor position and return them in a data frame. The $fetch() method is an analog of DBI::dbFetch() with essentially identical calling semantics. Specifying n = -1 or n = Inf will retrieve all features from the beginning (honoring any spatial or attribute filters that may be in effect).

Currently, gdalraster implements only minimal S3 class interfaces for R objects containing the returned feature data. A single feature as returned by $getFeature()/$getNextFeature() is a list object carrying the "OGRFeature" class attribute. A data frame returned by $fetch() carries the "OGRFeatureSet" class attribute. S3 methods are provided for the print() and plot() generics as a convenience for examining output. Otherwise, there are currently no S3 classes for geometries or geometry columns, no concept of "sticky" geometry as implemented in package sf, and no automatic propagation of coordinate reference systems. S3 interfaces may expand in future versions, but gdalraster leans toward minimalism and the use of simple, lightweight objects for holding raw data.

Interoperability

The $fetch() method of a GDALVector object returns geometries in a list column of WKB raw vectors by default. The Geometry API functions (g_*()) also operate by default on lists of WKB raw vectors. Representation of geometries as WKB is compact and performant, and seamlessly integrates with the parsing, conversion, manipulation and plotting functions in package wk (a gdalraster dependency for vector plotting, and currently used for g_coords()).

The $getArrowStream() method of a GDALVector object allows retrieving data in a column-oriented memory format. It exposes an ArrowArrayStream on a layer as a nanoarrow_array_stream object. The nanoarrow package provides functionality to consume the array stream and import to R data structures (which are generally column oriented). nanoarrow provides helpers to facilitate zero-copy data transfer among R bindings to libraries implementing the Arrow C data interface. It is possible to pass nanoarrow objects to many functions in the arrow package. nanoarrow objects also integrate with the extension types implemented in the geoarrow package.

A data frame returned by the $fetch() method of a GDALVector object (i.e., an "OGRFeatureSet") can be converted to an sf data frame with sf::st_sf(). A value for the crs argument can be obtained from method $getSpatialRef() of the GDALVector object from which the data were read (assuming no subsequent transformation of geometries has been performed).

Data type specifications

OGR field types are returned as the following R data types. R currently lacks a native 64-bit integer type. Support for 64-bit integer values in R is provided by the bit64 package (represented as numeric values carrying the "integer64" class attribute). OGR NULL values are returned as type-specific NA (i.e., NA, NA_integer_, NA_integer64_, NA_real_, NA_character_). When retrieving a batch of features as a data frame ("OGRFeatureSet"), some field types will be contained in a data frame list column as indicated:

OFTInteger: integer value
OFTInteger subtype OFSTBoolean: logical value
OFTIntegerList: vector of integer (list column)
OFTInteger64: numeric value carrying the "integer64" class attribute
OFTInteger64 subtype OFSTBoolean: logical value
OFTInteger64List: vector of bit64::integer64 (list column)
OFTReal: numeric value
OFTRealList: vector of numeric (list column)
OFTString: character string
OFTStringList: vector of character strings (list column)
OFTDate: numeric value of class "Date"
OFTDateTime: numeric value of class "POSIXct" (millisecond accuracy)
OFTTime: character string ("HH:MM:SS")
OFTBinary: raw vector (list column, NULL entries for OGR NULL values)

By default, geometries are returned as WKB raw vectors in a data frame list column (with NULL entries for OGR NULL geometries). The per-object setting $returnGeomAs can also be set to one of "WKB_ISO", "WKT", "WKT_ISO", or "NONE". Omitting the geometries (e.g., by setting lyr$returnGeomAs <- "NONE") may be beneficial for performance and memory usage when access only to feature attributes is required.

For more information on 64-bit integer support using the bit64 package, see ?bit64::integer64 or the web version at https://bit64.r-lib.org/reference/bit64-package.html.

Code examples

Parentheses around statements are used throughout the examples only as a shortcut to display the assigned values. The system.file() command is only used to obtain the path to the package sample files; it would not be needed in typical usage.

Inspect data sources

Data sources can be files, relational database management systems, directories of many files, or even remote web services depending on the format driver being used. However, the data source name (DSN) is always a single string which might be the file path, database connection string, URL, etc.

The following examples use a GeoPackage file included in gdalraster. The file ynp_features.gpkg is compressed using SOZip and will be read directly from the .zip archive without decompressing first. A prefix is added to the file path (/vsizip/ in this case) which specifies a GDAL Virtual File System handler. A file system handler provides access to less standard file types such as in-memory, compressed (.zip, .gz, .tar, .tar.gz), encrypted, standard input and output (STDIN, STDOUT), files stored on network (publicly accessible, or in private buckets of commercial cloud services), etc.

library(gdalraster)

# get path to the Yellowstone National Park sample dataset
f <- system.file("extdata/ynp_features.zip", package = "gdalraster")

# add the VSI prefix 
(zf <- file.path("/vsizip", f))

# list files in the zip archive
vsi_read_dir(zf)

# VSI path to the GPKG file
(zf_gpkg <- file.path(zf, "ynp_features.gpkg"))

It is possible to chain multiple file system handlers. The /vsicurl/ prefix specifies a file system handler that allows on-the-fly random reading of files available through web protocols, without prior download of the entire file. For example, a zip file residing on a web server instead of the local file system could be accessed with a path like:

/vsizip//vsicurl/https://www.example.com/path/to/file.zip

Support for SOZip is available with GDAL >= 3.7. The function vsi_get_file_metadata() can be used to validate an SOZip file and obtain compression information. Otherwise, SOZip is fully backward compatible and works as a regular .zip file.

if (gdal_version_num() >= gdal_compute_version(3, 7, 0)) {
  cat("SOZip metadata for ynp_features.gpkg:\n")
  vsi_get_file_metadata(zf_gpkg, domain = "ZIP") |> print()
} else {
  cat("SOZip support requires GDAL >= 3.7\n")
}

inspectDataset() returns information about the format and content of a data source that may contain raster and/or vector data.

inspectDataset(zf_gpkg)

OGR functions can be also be used to inspect a vector data source.

# test for existence of a vector data source with at least read access
ogr_ds_exists(zf_gpkg)
# note that update of an existing zipped .gpkg file is not supported
ogr_ds_exists(zf_gpkg, with_update = TRUE)

# list the vector layers
ogr_ds_layer_names(zf_gpkg)

ogrinfo() requires GDAL >= 3.7. It accepts an optional character vector containing any of the arguments supported by the ogrinfo command-line utility included with GDAL.

# list the layers in a data source
ogrinfo(zf_gpkg)

# detailed information about a specific layer
ogrinfo(zf_gpkg, "ynp_bnd", cl_arg = c("-geom=SUMMARY", "-wkt_format", "WKT2"))

Manage data sources and schemas

Here we copy the .gpkg file from the zip archive to an in-memory file that is writable, and make several modifications. The /vsimem/ file handler allows a block of memory to be treated as files. It is useful for temporary storage and is generally fast to access. The code below also demonstrates testing certain defined capabilities of the dataset or layer before attempting to perform modifications. This will not be needed in all cases but might be used in code that is required to handle input datasets in a general way.

# copy ynp_features.gpkg from the zip file to an in-memory file
mem_gpkg <- "/vsimem/tmp/ynp_features.gpkg"
ogr2ogr(zf_gpkg, mem_gpkg)

vsi_read_dir("/vsimem/tmp")

# confirm it's writable
ogr_ds_exists(mem_gpkg, with_update = TRUE)

rd_layer <- "roads"

# rename a layer (requires GDAL >= 3.5)
if (gdal_version_num() < gdal_compute_version(3, 5, 0)) {
  message("ogr_layer_rename() requires GDAL >= 3.5")
} else if (ogr_layer_test_cap(mem_gpkg, rd_layer)$Rename) {
  ogr_layer_rename(mem_gpkg, rd_layer, "roads2")
  rd_layer <- "roads2"
} else {
  message("layer does not have 'Rename' capability")
}

ogr_ds_layer_names(mem_gpkg)

# delete a layer
if (ogr_ds_test_cap(mem_gpkg)$DeleteLayer) {
  ogr_layer_delete(mem_gpkg, rd_layer)
} else {
  message("dataset does not have 'DeleteLayer' capability")
}

ogr_ds_layer_names(mem_gpkg)

# manage fields on a layer
ogr_layer_field_names(mem_gpkg, "points_of_interest")

# delete a field
if (ogr_layer_test_cap(mem_gpkg, "points_of_interest")$DeleteField) {
  ogr_field_delete(mem_gpkg, "points_of_interest", "createdate")
} else {
  message("layer does not have 'DeleteField' capability")
}

# rename fields
if (ogr_layer_test_cap(mem_gpkg, "points_of_interest")$AlterFieldDefn) {
  ogr_field_rename(mem_gpkg, "points_of_interest", "poiname", "poi_name")
  ogr_field_rename(mem_gpkg, "points_of_interest", "poitype", "poi_type")
} else {
  message("layer does not have 'AlterFieldDefn' capability")
}

As a simple example for illustration, we populate a new field to flag geothermal features.

# create a new field
if (ogr_layer_test_cap(mem_gpkg, "points_of_interest")$CreateField) {
  ogr_field_create(mem_gpkg, "points_of_interest",
                   fld_name = "is_geothermal",
                   fld_type = "OFTInteger",
                   fld_subtype = "OFSTBoolean")
} else {
  message("layer does not have 'CreateField' capability")
}

# edit data with SQL
sql <- "UPDATE points_of_interest SET is_geothermal = 
          CASE
            WHEN poi_type IN ('Basin', 'Geyser') THEN TRUE
            ELSE FALSE
          END"
ogr_execute_sql(mem_gpkg, sql)

ogr_layer_field_names(mem_gpkg, "points_of_interest")

To read back from the modified data source we need to instantiate objects of class GDALVector. More detail on working with GDALVector objects is given in the examples further below for data retrieval. For now, note that the function ogr_execute_sql() returns an object of class GDALVector for SQL statements that return a result set.

# read and display the geothermal features
sql <- "SELECT poi_name, geom
          FROM points_of_interest
            WHERE is_geothermal = TRUE"
(lyr <- ogr_execute_sql(mem_gpkg, sql))

lyr$getFeatureCount()

# retrieve all features in the result set (cf. DBI::dbFetch())
feat_set <- lyr$fetch(-1)
head(feat_set)

# plot the park boundary
# the layer contains a single polygon feature which is piped directly to plot()
GDALVector$new(mem_gpkg, "ynp_bnd")$getNextFeature() |>
  plot(col = "wheat", xlab = "longitude", ylab = "latitude",
       main = "YNP Geothermal Features")

plot(feat_set, pch = 19, col = "steelblue1", add = TRUE)

For clean up, the $close() method should be called on the layer object opened with ogr_execute_sql(). We also delete the temporary in-memory file.

lyr$close()

# delete a data source
vsi_unlink(mem_gpkg)  # delete a single file
# or, deleteDataset(mem_gpkg)

Traditional row-based data retrieval

These examples will use the original sample dataset for YNP features. The DSN is formed with the /viszip/ prefix, and full path to the file ynp_features.gpkg contained in the zip archive. An object of class GDALVector is generated with a call to new(). From the examples above, we know that the GeoPackage contains three vector layers, so we also pass a layer name argument to the class constructor.

Note that if the layer argument is omitted, the constructor will attempt to open the first layer by index. The abbreviated form of the constructor would generally be used for convenience only with single-layer file formats (e.g., ESRI Shapefile, FlatGeoBuf), or with a GeoPackage file known to contain only a single layer.

A SQL SELECT statement that returns a result set can also be used for the layer argument in the constructor.

Here we instantiate a GDALVector object for the park boundary layer and retrieve information about it.

f <- system.file("extdata/ynp_features.zip", package = "gdalraster")
ynp_dsn <- file.path("/vsizip", f, "ynp_features.gpkg")

# the park boundary layer containing a single feature
(bnd <- new(GDALVector, ynp_dsn, "ynp_bnd"))

bnd$getFeatureCount()

# spatial reference definition as WKT string
bnd$getSpatialRef()

# xmin, ymin, xmax, ymax
bnd$bbox()

# structure of the layer definition (a.k.a. feature class definition)
bnd$getLayerDefn() |> str()

Traditional row-level access to feature data with OGR is cursor-based, reading one feature at a time. When starting with a new layer object, the cursor is positioned at the beginning and we can iterate through features with calls to $getNextFeature(). This method returns an individual feature as a named list of fields and their values, and moves the cursor forward one row. It will return NULL when no features are left.

The park boundary layer contains a single polygon feature. The list object returned for an individual feature carries the "OGRFeature" class attribute.

bnd_feat <- bnd$getNextFeature()
str(bnd_feat)

# no more features
bnd$getNextFeature()

bnd$close()

The cursor can be reset to the beginning of the layer at any time with a call to $resetReading().

Fetch all features or progressively in batches

Class GDALVector also provides the $fetch() method to retrieve the next n features from the layer and return them as a data frame (analog of DBI::dbFetch()). $fetch() can be used to get the entire feature set for a layer, one batch of features at a time, or the remaining features from the current cursor position. Fetching zero features is also possible to retrieve the structure of the layer as a 0-row data frame with fully typed columns. As with $getNextFeature(), only forward paging is supported.

Passing n = -1 or n = Inf will retrieve all features from the beginning (with an automatic call to $resetReading()). Passing n = NA is supported and will retrieve the remaining features from the current cursor position. Otherwise, features can be fetched progressively by passing a whole number (integer or numeric) as the n argument. If more features than available are requested, the result is returned in full without warning. If fewer features than requested are returned, further fetches will return a data frame with zero rows.

Note that it is possible to set an attribute filter and/or spatial filter prior to retrieving data to restrict the returned feature set. It is also possible to set specific fields as ignored (or to specify the list of desired fields). This includes potentially omitting the geometry column when access only to feature attributes is needed. Limiting the number of fields returned may save some processing time and/or bandwidth.

The layer argument in the constructor for GDALVector can also be given as a SQL SELECT statement. Here we read from the roads layer selecting only the publicly accessible roads. We call $fetch() with n = -1 to read from the beginning and retrieve all available features. The data frame returned by $fetch() carries the "OGRFeatureSet" class attribute providing S3 methods for print() and plot().

# SQL layer for public roads
sql <- "SELECT rdname, opentopubl, geom FROM roads WHERE opentopubl = 'Yes'"
(roads <- new(GDALVector, ynp_dsn, sql))

roads$getFeatureCount()

roads$getFieldNames()

roads_featset <- roads$fetch(-1)
nrow(roads_featset)

head(roads_featset)

plot(bnd_feat, col = "wheat", xlab = "longitude", ylab = "latitude",
     main = "YNP Public Roads")

plot(roads_featset, col = "slategray", lwd = 2, add = TRUE)

roads$close()

Fetching in batches may be advantageous when dealing with large datasets, especially when reading over a network. Also, note that $getFeatureCount() is called internally when fetching the full feature set but not for a batch of n features. Some vector drivers will actually scan the entire layer once to count features. The FastFeatureCount element in the list returned by the $testCapability() method can be checked if this might be a concern.

poi <- new(GDALVector, ynp_dsn, "points_of_interest")

poi$getFeatureCount()

# read progressively in batches
batch_size <- 500
batch <- 0
while (TRUE) {
    poi_featset <- poi$fetch(batch_size)
    if (nrow(poi_featset) == 0) break
    cat("batch", batch <- batch + 1, ":", nrow(poi_featset), "features\n")
    # process batch
    # ...
}

poi$close()

Read using ArrowArrayStream

It is also possible to retrieve features in a column-oriented memory layout using the Arrow C Stream interface in OGR (requires GDAL >= 3.6). Performance advantages can be substantial with vector format drivers that provide specialized implementations (e.g., Parquet, FlatGeoBuf, GeoPackage). Drivers with specialized implementations advertise the FastGetArrowStream layer capability.

The $getArrowStream() method of class GDALVector exposes an ArrowArrayStream on the layer as a nanoarrow_array_stream object. Note that the nanoarrow_array_stream object has a $get_next() method which allows accessing features in the stream by batch (returning a nanoarrow_array object). The default batch size for an OGR array stream is 65,536 features, but can be configured on a per-object basis by assigning a value to the GDALVector writable field $arrowStreamOptions (as a character vector of "NAME=VALUE" pairs). nanoarrow provides S3 methods for as.data.frame() to import a nanoarrow_array (one batch at a time), or the nanoarrow_array_stream itself (pulling all batches in the stream).

# Expose an ArrowArrayStream (requires GDAL >= 3.6)

# re-open the roads layer with the required argument for type of access
roads$open(read_only = TRUE)
roads$resetReading()

# does the layer have a specialized implementation
roads$testCapability()$FastGetArrowStream

# optionally set ArrowStream options as character vector of "NAME=VALUE", e.g.,
roads$arrowStreamOptions <- "INCLUDE_FID=NO"
# the default batch size of 65,536 could also be configured with
# MAX_FEATURES_IN_BATCH=num

(stream <- roads$getArrowStream())

# get the array schema if needed
schema <- stream$get_schema()

# optionally read by batch, NULL is returned when no more features are available
# batch <- stream$get_next()
# if (!is.null(batch))
#   d_batch <- as.data.frame(batch)

# or, pull all the batches into a data frame
d <- as.data.frame(stream)
nrow(d)
head(d)

# release the stream when finished
stream$release()

# 'geom' is a list column of WKB raw vectors which can be passed to the
# Geometry API functions
geom_utm <- g_transform(d$geom,
                        srs_from = roads$getSpatialRef(),
                        srs_to = "EPSG:26912")

# add a column with road lengths in meters
d$rdlength <- g_length(geom_utm)
head(d)

roads$close()

Reproject vector layers

The following example introduces another sample dataset which contains fire perimeters for Yellowstone National Park from the Monitoring Trends in Burn Severity (MTBS) project. Here we set up a temporary, writable copy of the dataset ynp_fires_1984_2022.gpkg (in memory), and use it as a data source to perform further processing.

The MTBS layer uses a projected coordinate system, while layers of the "YNP features" dataset used in the examples above are in geographic coordinates. The ogr_reproject() function is used to project layers in geographic coordinates from ynp_features.gpkg to match the coordinate system of the MTBS layer.

# MTBS fire perimeters in Yellowstone National Park 1984-2022
f <- system.file("extdata/ynp_fires_1984_2022.gpkg", package = "gdalraster")

# copy to a temporary writable file
mtbs_dsn <- "/vsimem/tmp/ynp_fires_1984_2022.gpkg"
ogr2ogr(f, mtbs_dsn)

(fires <- new(GDALVector, mtbs_dsn, "mtbs_perims"))

srs_mtsp <- fires$getSpatialRef()  # Montana state plane metric definition

# reproject the boundary in ynp_features.gpkg to match the MTBS layer,
# returning a GDALVector object on the output layer by default
(bnd <- ogr_reproject(src_dsn = ynp_dsn, src_layer = "ynp_bnd",
                      out_dsn = mtbs_dsn, out_srs = srs_mtsp))

(bnd_feat <- bnd$getNextFeature())

bnd$close()

# reproject points_of_interest
poi <- ogr_reproject(ynp_dsn, "points_of_interest", mtbs_dsn, srs_mtsp)

# set an attribute filter to select the Maple Fire
fires$setAttributeFilter("incid_name = 'MAPLE'")
fires$getFeatureCount()

maple_fire <- fires$getNextFeature()

# use the fire polygon as a spatial filter for points_of_interest
# setSpatialFilter() expects WKT input, so convert the WKB geometry
g_wk2wk(maple_fire$geom) |> poi$setSpatialFilter()
poi$getFeatureCount()

poi$setSelectedFields(c("poiname", "poitype", "geom"))
(maple_pois <- poi$fetch(-1))

plot(bnd_feat, col = "wheat")
plot(maple_fire, col = "orangered", border = NA, add = TRUE)
plot(maple_pois, pch = 15, col = "royalblue", add = TRUE)

fires$close()
poi$close()

Write a new feature in an existing layer

The $createFeature() method of GDALVector object creates and writes a new feature within the layer. The feature argument is a named list of fields and their values (which could be be one row of a data frame). The passed feature is written to the layer as a new feature, rather than overwriting an existing one.

Here we add the centroid of the YNP boundary as a point of interest.

# create a feature object for the YNP centroid as a point of interest
(bnd_centroid_xy <- g_centroid(bnd_feat$geom))

feat <- list()
feat$poiname <- "YNP centroid"
feat$poitype <- "Information"
feat$createdate <- Sys.Date()
feat$editdate <- Sys.Date()
feat$geom <- g_create("POINT", bnd_centroid_xy)

# re-open the "points_of_interest" layer with update access
poi$open(read_only = FALSE)
poi$testCapability()$SequentialWrite

# create and write the new feature on the layer
poi$createFeature(feat)

# be sure pending writes are flushed
poi$syncToDisk()

# read back
fid <- poi$getLastWriteFID()
(ynp_centroid <- poi$getFeature(fid))

plot(bnd_feat, col = "wheat")
plot(ynp_centroid, pch = 10, cex = 1.5, add = TRUE)

Rewrite/replace an existing feature

The $setFeature() method of a GDALVector object writes a feature based on the feature id (FID) specified in the input. As with $createFeature(), the feature argument is a named list of fields and their values, and must include a $FID element referencing the existing feature to rewrite. Note that if any fields are omitted in the passed feature the write behavior is driver-dependent:

SQL-based drivers which implement set feature through SQL UPDATE will skip unset fields, and thus the content of the existing feature will be preserved.
The shapefile driver will write a NULL value in the DBF file.
The GeoJSON driver will take into account unset fields to remove the corresponding JSON member.

# rewrite a feature in the "point_of_interest" layer updating the feature name
# verify the layer has random write capability
poi$testCapability()$RandomWrite

# FID 3809 is missing the trailhead name
(feat <- poi$getFeature(3809))

# update the name field and the date of the edit
feat$poiname <- "Ice Lake Trailhead"
feat$editdate <- Sys.Date()

# rewrite the feature
poi$setFeature(feat)
poi$syncToDisk()

(feat <- poi$getFeature(3809))

Delete a feature

The $deleteFeature() method of a GDALVector object deletes the feature with the indicated feature ID if supported by the format driver. The value of fid must be a single numeric value, optionally carrying the bit64::integer64 class attribute.

# delete the "YNP centroid" feature that was created above
# verify the layer has delete feature capability
poi$testCapability()$DeleteFeature

# the feature ID was obtained above as: fid <- poi$getLastWriteFID()
poi$getFeature(fid)

poi$deleteFeature(fid)
poi$syncToDisk()

poi$getFeature(fid)

poi$close()

Batch write features from data frame input

The $batchCreateFeature()method of a GDALVector object is batch version of $createFeature(). It creates and writes a batch of new features within the layer from input given as a data frame. Column names in the data frame must match field names of the layer and have compatible data types.

Operations that write batches of features should use transactions for better performance when writing into RDBMS drivers that have native transaction support (e.g., PostgreSQL / PostGIS, GPKG, SQLite). Grouping many features per transaction (e.g., 100,000 or more) can improve performance substantially. Using transactions also provides a mechanism to rollback a group of data modifications that fails to complete or generates errors.

The example below compares the relative performance of writing a batch of features to a GeoPackage layer with and without a transaction. We create a new POINT layer in the existing GPKG (mtbs_dsn), and write batches of features at random point locations within the YNP bounding box.

By default, the GeoPackage driver automatically creates a column named fid to use for the OGR feature ID (primary key in the SQLite database). This can be configured with the FID=value layer creation option. Note that the FID is a special property of a feature and not treated as an attribute of a feature, so it is not specified in the layer definition.

# create a layer definition for random_points
# the spatial ref was obtained above as: srs_mtsp <- fires$getSpatialRef()
defn <- ogr_def_layer("POINT", srs = srs_mtsp)
defn$pt_desc <- ogr_def_field("OFTString")
defn$create_time <- ogr_def_field("OFTDateTime")

ogr_layer_create(mtbs_dsn, "random_points", layer_defn = defn)

lyr <- new(GDALVector, mtbs_dsn, "random_points", read_only = FALSE)

bb <- g_wk2wk(bnd_feat$geom) |> bbox_from_wkt()

Next we create the first batch of features and write them without grouping in a transaction, measuring time elapsed.

batch_size <- as.integer(1e5)

# create a batch of features
rndX <- sample((bb[1] + 1):(bb[3] - 1), batch_size, replace = TRUE)
rndY <- sample((bb[2] + 1):(bb[4] - 1), batch_size, replace = TRUE)
pts <- cbind(rndX, rndY)
pts_geom <- g_create("POINT", pts)
d <- data.frame(pt_desc = rep("random points batch 1", batch_size),
                create_time = rep(Sys.time(), batch_size))
d$geom <- pts_geom

# write the batch (no transaction)
system.time(res <- lyr$batchCreateFeature(d))

(all(res))

lyr$syncToDisk()

A second batch of features is grouped in a transaction for writing, and the layer is checked for the expected output.

rndX <- sample((bb[1] + 1):(bb[3] - 1), batch_size, replace = TRUE)
rndY <- sample((bb[2] + 1):(bb[4] - 1), batch_size, replace = TRUE)
pts <- cbind(rndX, rndY)
pts_geom <- g_create("POINT", pts)
d <- data.frame(pt_desc = rep("random points batch 2", batch_size),
                create_time = rep(Sys.time(), batch_size))
d$geom <- pts_geom

# write the batch using a transaction
system.time({
  lyr$startTransaction()
  res2 <- lyr$batchCreateFeature(d)
  if (all(res2))
    lyr$commitTransaction()
  else
    lyr$rollbackTransaction()
})

(all(res2))

# check the output data
d_out <- lyr$fetch(-1)
(nrow(d_out) == batch_size * 2)

head(d_out)

tail(d_out)

lyr$close()

Write to GeoJSON

Single-layer vector file formats (e.g., ESRI shapefile, FlatGeoBuf, GeoJSON) can be created from scratch with a call to ogr_ds_create(). This function provides optional arguments to add a single attribute field on the layer when it is created. More attribute fields could be added after layer creation with calls to ogr_field_create() (for vector formats that have the CreateField layer capability). Alternatively, a layer definition (as list object) can be passed to ogr_ds_create() to specify multiple attribute fields and their properties (see ?ogr_def_layer).

GeoJSON does not support schema definition prior to creating features. Only the Feature object has a member with name properties. The specification does not require all Feature objects in a collection to have the same schema of properties, nor does it require all Feature objects in a collection to have geometry of the same type (https://geojson.org/). Note that we set return_obj = TRUE in the call to ogr_ds_create(). This returns a live dataset object open for write access on the layer (i.e., an object of class GDALVector). A valid GeoJSON file will be generated once one or more features have been written to the layer.

Here we generate a GeoJSON string containing a single polygon feature that defines a rectangular area of interest around the Maple Fire boundary. This GeoJSON format is supported for input by some data distribution systems to define a download box (e.g., NLCD and LANDFIRE). For this case we need coordinates in Web Mercator projection (EPSG 3857).

# write the Maple Fire AOI bounding box as GeoJSON in EPSG 3857
json_file <- file.path(tempdir(), "maple_fire_aoi.geojson")

lyr <- ogr_ds_create("GeoJSON", json_file, layer = "maple_fire_aoi",
                     geom_type = "POLYGON", srs = "EPSG:3857",
                     fld_name = "id", fld_type = "OFTString",
                     overwrite = TRUE, return_obj = TRUE)

# The Maple Fire feature object and spatial reference were obtained above in
# the section on "Reproject vector layers".
# Here we extend the minimum bounding box by 500 m in each direction.
feat <- list()
feat$id <- "dataDownloadBox"
feat$geom <- g_transform(maple_fire$geom, srs_from = srs_mtsp,
                         srs_to = "EPSG:3857", as_wkb = FALSE) |>
               bbox_from_wkt(extend_x = 500, extend_y = 500) |>
               bbox_to_wkt()

lyr$createFeature(feat)

lyr$close()

readLines(json_file) |> writeLines()

OGR facilities for vector geoprocessing

ogr_proc() performs GIS overlay operations on vector layers. It provides an interface to GDAL API methods for the following operations: Intersection, Union, SymDifference, Identity, Update, Clip and Erase.

Inputs are given as objects of class GDALVector, which may have spatial and/or attribute filters applied. The output layer will be created if it does not exist, but output can also be appended to an existing layer, or written to an existing empty layer that has a custom schema defined.

Here we generate a layer of areas within the 1988 North Fork fire perimeter that have subsequently re-burned (as of 2022). The output layer contains features whose geometries represent areas that are common between features in the input layer and in the method layer. The features in the output layer have attributes from both the input and method layers.

# layer filtered to fires after 1988
lyr1 <- new(GDALVector, mtbs_dsn, "mtbs_perims")
lyr1$setAttributeFilter("ig_year > 1988")
lyr1$getFeatureCount()

# second layer for the 1988 North Fork fire perimeter
sql <- "SELECT incid_name, ig_year, geom FROM mtbs_perims
        WHERE incid_name = 'NORTH FORK'"
lyr2 <- new(GDALVector, mtbs_dsn, sql)
lyr2$getFeatureCount()

north_fork_feat <- lyr2$getNextFeature()

# set mode options for the intersection
opt <- c("INPUT_PREFIX=layer1_",
         "METHOD_PREFIX=layer2_",
         "PROMOTE_TO_MULTI=YES")

# intersect to obtain areas re-burned since 2000
lyr_out <- ogr_proc(mode = "Intersection",
                    input_lyr = lyr1,
                    method_lyr = lyr2,
                    out_dsn = mtbs_dsn,
                    out_lyr_name = "north_fork_reburned",
                    out_geom_type = "MULTIPOLYGON",
                    mode_opt = opt)

# the output layer has attributes of both the input and method layers
(reburn_feat_set <- lyr_out$fetch(-1))

plot(north_fork_feat)
plot(reburn_feat_set, col = "orangered", border = NA, add = TRUE,
     main = "1988 North Fork fire perimeter showing re-burned areas in red")

# clean up
lyr1$close()
lyr2$close()
lyr_out$close()

# clean up
unlink(json_file)
vsi_unlink(mtbs_dsn)
vsi_rmdir("/vsimem/tmp/")

Any scripts or data that you put into this service are public.

gdalraster documentation built on June 8, 2025, 12:37 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

gdalraster
Bindings to 'GDAL'

Vector API Overview
In gdalraster: Bindings to 'GDAL'

Background

Rationale

Vector API components

Core bindings

Class `GDALVector`

OGR stand-alone functions

Supporting APIs

Spatial Reference Systems

Geometry

Virtual Systems Interface

General data management

Workflow considerations

Data retrieval

Interoperability

Data type specifications

Code examples

Inspect data sources

Manage data sources and schemas

Traditional row-based data retrieval

Fetch all features or progressively in batches

Read using ArrowArrayStream

Reproject vector layers

Write a new feature in an existing layer

Rewrite/replace an existing feature

Delete a feature

Batch write features from data frame input

Write to GeoJSON

OGR facilities for vector geoprocessing

Try the gdalraster package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

gdalraster Bindings to 'GDAL'

Vector API Overview In gdalraster: Bindings to 'GDAL'

Background

Rationale

Vector API components

Core bindings

Class GDALVector

OGR stand-alone functions

Supporting APIs

Spatial Reference Systems

Geometry

Virtual Systems Interface

General data management

Workflow considerations

Data retrieval

Interoperability

Data type specifications

Code examples

Inspect data sources

Manage data sources and schemas

Traditional row-based data retrieval

Fetch all features or progressively in batches

Read using ArrowArrayStream

Reproject vector layers

Write a new feature in an existing layer

Rewrite/replace an existing feature

Delete a feature

Batch write features from data frame input

Write to GeoJSON

OGR facilities for vector geoprocessing

Try the gdalraster package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

gdalraster
Bindings to 'GDAL'

Vector API Overview
In gdalraster: Bindings to 'GDAL'

Class `GDALVector`