#' Translate a .osm.pbf file into .gpkg format
#'
#' This function is used to translate a `.osm.pbf` file into `.gpkg` format.
#' The conversion is performed using
#' [ogr2ogr](https://gdal.org/en/stable/programs/ogr2ogr.html) via the
#' `vectortranslate` utility in [sf::gdal_utils()] . It was created following
#' [the
#' suggestions](https://github.com/OSGeo/gdal/issues/2100#issuecomment-565707053)
#' of the maintainers of GDAL. See Details and Examples to understand the basic
#' usage, and check the introductory vignette for more complex use-cases.
#'
#' @details The new `.gpkg` file is created in the same directory as the input
#' `.osm.pbf` file. The translation process is performed using the
#' `vectortranslate` utility in [sf::gdal_utils()]. This operation can be
#' customized in several ways modifying the parameters `layer`, `extra_tags`,
#' `osmconf_ini`, `vectortranslate_options`, `boundary` and `boundary_type`.
#'
#' The `.osm.pbf` files processed by GDAL are usually categorized into 5
#' layers, named `points`, `lines`, `multilinestrings`, `multipolygons` and
#' `other_relations`. Check the first paragraphs
#' [here](https://gdal.org/en/stable/drivers/vector/osm.html) for more details. This
#' function can covert only one layer at a time, and the parameter `layer` is
#' used to specify which layer of the `.osm.pbf` file should be converted.
#' Several layers with different names can be stored in the same `.gpkg` file.
#' By default, the function will convert the `lines` layer (which is the most
#' common one according to our experience).
#'
#' The arguments `osmconf_ini` and `extra_tags` are used to modify how GDAL
#' reads and processes a `.osm.pbf` file. More precisely, several operations
#' that GDAL performs on the input `.osm.pbf` file are governed by a `CONFIG`
#' file, that can be checked at the following
#' [link](https://github.com/OSGeo/gdal/blob/master/ogr/ogrsf_frmts/osm/data/osmconf.ini).
#' The basic components of OSM data are called
#' [*elements*](https://wiki.openstreetmap.org/wiki/Elements) and they are
#' divided into *nodes*, *ways* or *relations*, so, for example, the code at
#' line 7 of that file is used to determine which *ways* are assumed to be
#' polygons (according to the simple-feature definition of polygon) if they
#' are closed. Moreover, OSM data is usually described using several
#' [*tags*](https://wiki.openstreetmap.org/wiki/Tags), i.e pairs of two items:
#' a key and a value. The code at lines 33, 53, 85, 103, and 121 is used to
#' determine, for each layer, which tags should be explicitly reported as
#' fields (while all the other tags are stored in the `other_tags` column).
#' The parameter `extra_tags` is used to determine which extra tags (i.e.
#' key/value pairs) should be added to the `.gpkg` file (other than the
#' default ones).
#'
#' By default, the vectortranslate operations are skipped if the function
#' detects a file having the same path as the input file, `.gpkg` extension, a
#' layer with the same name as the parameter `layer` and all `extra_tags`. In
#' that case the function will simply return the path of the `.gpkg` file.
#' This behaviour can be overwritten setting `force_vectortranslate = TRUE`.
#' The vectortranslate operations are never skipped if `osmconf_ini`,
#' `vectortranslate_options`, `boundary` or `boundary_type` arguments are not
#' `NULL`.
#'
#' The parameter `osmconf_ini` is used to pass your own `CONFIG` file in case
#' you need more control over the GDAL operations. Check the package
#' introductory vignette for an example. If `osmconf_ini` is equal to `NULL`
#' (the default value), then the function uses the standard `osmconf.ini` file
#' defined by GDAL (but for the extra tags).
#'
#' The parameter `vectortranslate_options` is used to control the options that
#' are passed to `ogr2ogr` via [sf::gdal_utils()] when converting between
#' `.osm.pbf` and `.gpkg` formats. `ogr2ogr` can perform various operations
#' during the conversion process, such as spatial filters or SQL queries.
#' These operations can be tuned using the `vectortranslate_options` argument.
#' If `NULL` (the default value), then `vectortranslate_options` is set equal
#' to
#'
#' `c("-f", "GPKG", "-overwrite", "-oo", paste0("CONFIG_FILE=", osmconf_ini),
#' "-lco", "GEOMETRY_NAME=geometry", layer)`.
#'
#' Explanation:
#' * `"-f", "GPKG"` says that the output format is `GPKG`;
#' * `"-overwrite` is used to delete an existing layer and recreate
#' it empty;
#' * `"-oo", paste0("CONFIG_FILE=", osmconf_ini)` is used to set the
#' [Open Options](https://gdal.org/en/stable/drivers/vector/osm.html#open-options)
#' for the `.osm.pbf` file and change the `CONFIG` file (in case the user
#' asks for any extra tag or a totally different CONFIG file);
#' * `"-lco", "GEOMETRY_NAME=geometry"` is used to change the
#' [layer creation options](https://gdal.org/en/stable/drivers/vector/gpkg.html#layer-creation-options)
#' for the `.gpkg` file and modify the name of the geometry column;
#' * `layer` indicates which layer should be converted.
#'
#' If `vectortranslate_options` is not `NULL`, then the options `c("-f",
#' "GPKG", "-overwrite", "-oo", "CONFIG_FILE=", path-to-config-file, "-lco",
#' "GEOMETRY_NAME=geometry", layer)` are always appended unless the user
#' explicitly sets different default parameters for the arguments `-f`, `-oo`,
#' `-lco`, and `layer`.
#'
#' The arguments `boundary` and `boundary_type` can be used to set up a
#' spatial filter during the vectortranslate operations (and speed up the
#' process) using an `sf` or `sfc` object (`POLYGON` or `MULTIPOLYGON`). The
#' default arguments create a rectangular spatial filter which selects all
#' features that intersect the area. Setting `boundary_type = "clipsrc"` clips
#' the geometries. In both cases, the appropriate options are automatically
#' added to the `vectortranslate_options` (unless a user explicitly sets
#' different default options). Check Examples in `oe_get()` and the
#' introductory vignette.
#'
#' See also the help page of [`sf::gdal_utils()`] and
#' [ogr2ogr](https://gdal.org/en/stable/programs/ogr2ogr.html) for more examples and
#' extensive documentation on all available options that can be tuned during
#' the vectortranslate process.
#'
#' @inheritParams oe_get
#' @param file_path Character string representing the path of the input
#' `.pbf` or `.osm.pbf` file.
#'
#' @return Character string representing the path of the `.gpkg` file.
#' @export
#'
#' @seealso [oe_get_keys()]
#'
#' @examples
#' # First we need to match an input zone with a .osm.pbf file
#' (its_match = oe_match("ITS Leeds"))
#'
#' # Copy ITS file to tempdir so that the examples do not require internet
#' # connection. You can skip the next 3 lines (and start directly with
#' # oe_download()) when running the examples locally.
#'
#' file.copy(
#' from = system.file("its-example.osm.pbf", package = "osmextract"),
#' to = file.path(tempdir(), "test_its-example.osm.pbf"),
#' overwrite = TRUE
#' )
#'
#' # The we can download the .osm.pbf file (if it was not already downloaded)
#' its_pbf = oe_download(
#' file_url = its_match$url,
#' file_size = its_match$file_size,
#' download_directory = tempdir(),
#' provider = "test"
#' )
#'
#' # Check that the file was downloaded
#' list.files(tempdir(), pattern = "pbf|gpkg")
#'
#' # Convert to gpkg format
#' its_gpkg = oe_vectortranslate(its_pbf)
#'
#' # Now there is an extra .gpkg file
#' list.files(tempdir(), pattern = "pbf|gpkg")
#'
#' # Check the layers of the .gpkg file
#' sf::st_layers(its_gpkg, do_count = TRUE)
#'
#' # Add points layer
#' its_gpkg = oe_vectortranslate(its_pbf, layer = "points")
#' sf::st_layers(its_gpkg, do_count = TRUE)
#'
#' # Add extra tags to the lines layer
#' names(sf::st_read(its_gpkg, layer = "lines", quiet = TRUE))
#' its_gpkg = oe_vectortranslate(
#' its_pbf,
#' extra_tags = c("oneway", "maxspeed")
#' )
#' names(sf::st_read(its_gpkg, layer = "lines", quiet = TRUE))
#'
#' # Adjust vectortranslate options and convert only 10 features
#' # for the lines layer
#' oe_vectortranslate(
#' its_pbf,
#' vectortranslate_options = c("-limit", 10)
#' )
#' sf::st_layers(its_gpkg, do_count = TRUE)
#'
#' # Remove .pbf and .gpkg files in tempdir
#' oe_clean(tempdir())
oe_vectortranslate = function(
file_path,
layer = "lines",
vectortranslate_options = NULL,
osmconf_ini = NULL,
extra_tags = NULL,
force_vectortranslate = FALSE,
never_skip_vectortranslate = FALSE,
boundary = NULL,
boundary_type = c("spat", "clipsrc"),
quiet = FALSE
) {
# Check that the input file was specified using the format
# ".../something.pbf". This is important for creating the .gpkg file path.
if (! tools::file_ext(file_path) %in% c("pbf", "osm") || !file.exists(file_path)) {
oe_stop(
.subclass = "oe_vectortranslate_filePathMissingOrNotPbf",
message = "The parameter file_path must correspond to an existing .pbf file"
)
}
# Check that the layer param is not NA or NULL
if (
is.null(layer) ||
is.na(layer) ||
# I need the following condition to check that the function
# get_id_layer does not return NULL
tolower(layer) %!in% c(
"points", "lines", "multipolygons", "multilinestrings", "other_relations"
)
) {
oe_stop(
.subclass = "oe_vectortranslate-layerNotProperlySpecified",
message = paste0(
"You need to specify the layer parameter and it must be one of",
" points, lines, multipolygons, multilinestrings or other_relations."
)
)
}
# We need to build the file path of the .gpkg using the following convention:
# it is the same file path of the .pbf/.osm.pbf file but with .gpkg extension.
# I need to use the if clause to check if the input file is something.osm.pbf
# or something.pbf
if (tools::file_ext(tools::file_path_sans_ext(file_path)) == "osm") {
gpkg_file_path = paste0(
# I need the double file_path_san_ext to cancel the .osm and the .pbf
tools::file_path_sans_ext(tools::file_path_sans_ext(file_path)),
".gpkg"
)
} else {
# Just change the extensions
gpkg_file_path = paste0(tools::file_path_sans_ext(file_path), ".gpkg")
}
# Check if the user passed its own osmconf.ini file or vectortranslate_options
# (or boundary object) since, in that case, we always need to perform the
# vectortranslate operations (since it's too difficult to determine if an
# existing .gpkg file was generated following a particular .ini file with some
# options)
if (!is.null(osmconf_ini) || !is.null(vectortranslate_options) || !is.null(boundary)) {
force_vectortranslate = TRUE
never_skip_vectortranslate = TRUE
}
# Check if an existing .gpkg file contains the selected layer
if (file.exists(gpkg_file_path) && isFALSE(force_vectortranslate)) {
if (layer %!in% sf::st_layers(gpkg_file_path)[["name"]]) {
# Try to add the new layer from the .osm.pbf file to the .gpkg file
oe_message(
"Adding a new layer to the .gpkg file.",
quiet = quiet,
.subclass = "oe_vectortranslate_addingNewLayer"
)
force_vectortranslate = TRUE
}
}
# Check if the user choose to add some extra tags
if (!is.null(extra_tags)) {
force_vectortranslate = TRUE
# Check if all extra keys are already present into an existing .gpkg file I
# set is.null(osmconf_ini) since if the user pass its own osmconf.ini file
# then the vectortranslate operations must be performed in any case
if (
file.exists(gpkg_file_path) &&
is.null(osmconf_ini) &&
# The next condition is used to check that the function is not looking for
# old tags in a non-existing layer, otherwise the following code will fail
# with an error:
# its_gpkg = oe_vectortranslate(its_pbf)
# oe_vectortranslate(
# its_pbf,
# layer = "points",
# extra_tags = "oneway"
# )
layer %in% sf::st_layers(gpkg_file_path)[["name"]] &&
!never_skip_vectortranslate
) {
# Starting from sf 1.0.2, sf::st_read raises a warning message when both
# layer and query arguments are set (while it raises a warning in sf <
# 1.0.2 when there are multiple layers and the layer argument is not set).
# See also https://github.com/r-spatial/sf/issues/1444
if (utils::packageVersion("sf") <= "1.0.1") {
old_tags = names(sf::st_read(
gpkg_file_path,
layer = layer,
quiet = TRUE,
query = paste0("select * from \"", layer, "\" limit 0")
))
} else {
old_tags = names(sf::st_read(
gpkg_file_path,
quiet = TRUE,
query = paste0("select * from \"", layer, "\" limit 0")
))
}
# Convert the character ":" into "_" for the extra_tags argument (see also
# https://github.com/ropensci/osmextract/issues/260 for more details). I
# create a temp object since I don't need to actually change the argument.
# NB: The laundering of ":" into "_" by GDAL is actually controlled by the
# attribute_name_laundering tag in osmconf.ini. However, since we do not
# support the "extra_tag" and "osmconf_ini" arguments at the same time, we
# do not need to check whether attribute_name_laundering=no is uncommented in
# the .ini file. In fact, if osmconf_ini is not NULL, the
# vectortranslate operations are never skipped.
temp_extra_tags <- gsub(":", "_", extra_tags)
if (all(temp_extra_tags %in% old_tags)) {
force_vectortranslate = FALSE
}
}
}
# If the gpgk file already exists and force_vectortranslate is FALSE then we
# raise a message and return the path of the .gpkg file.
if (file.exists(gpkg_file_path) && isFALSE(force_vectortranslate)) {
oe_message(
"The corresponding gpkg file was already detected. ",
"Skip vectortranslate operations.",
quiet = quiet,
.subclass = "oe_vectortranslate_skipOperations"
)
return(gpkg_file_path)
}
# Otherwise we are going to convert the input .osm.pbf file using the
# vectortranslate utils from sf::gdal_util.
# The extra_tags argument is ignored if the user set its own osmconf_ini file
# (since we do not know how it was generated):
# See https://github.com/ropensci/osmextract/issues/117
if (!is.null(osmconf_ini) && !is.null(extra_tags)) {
warning(
"The argument extra_tags is ignored when osmconf_ini is not NULL.",
call. = FALSE
)
extra_tags = NULL
}
# First we need to set the values for the parameter osmconf_ini (if it is set
# to NULL, i.e. the default).
if (is.null(osmconf_ini)) {
# The file osmconf.ini stored in the package is the default osmconf.ini used
# by GDAL at stored at the following link:
# https://github.com/OSGeo/gdal/blob/master/data/osmconf.ini
# It was saved on the 9th of July 2020.
osmconf_ini = system.file("osmconf.ini", package = "osmextract")
}
# Add the extra tags to the default osmconf.ini. If the user set its own
# osmconf.ini file we need to skip this step.
if (
!is.null(extra_tags) &&
# The following condition checks whether the user set its own CONFIG file
osmconf_ini == system.file("osmconf.ini", package = "osmextract")
) {
temp_ini = readLines(osmconf_ini)
id_old = get_id_layer(layer)
fields_old = get_fields_default(layer)
temp_ini[[id_old]] = paste0(
"attributes=",
paste(unique(c(fields_old, extra_tags)), collapse = ",")
)
temp_ini_file = tempfile(fileext = ".ini")
writeLines(temp_ini, con = temp_ini_file)
osmconf_ini = temp_ini_file
}
# If vectortranslate options is NULL (i.e. the default value), then we adopt
# the following set of options:
if (is.null(vectortranslate_options)) {
vectortranslate_options = c(
"-f", "GPKG", # output file format
"-overwrite", # overwrite an existing file
"-oo", paste0("CONFIG_FILE=", osmconf_ini), # open options
"-lco", "GEOMETRY_NAME=geometry" # layer creation options
)
# Check if we need to add a spatial filter
vectortranslate_options = process_boundary(
vectortranslate_options,
boundary,
boundary_type
)
# Add the layer argument
vectortranslate_options = c(vectortranslate_options, layer)
} else {
# Otherwise we check the options set by the user and append other basic
# options:
# 1. Check if the user omitted the "-f" option (which is used to select the
# format_name)
if ("-f" %!in% vectortranslate_options) {
vectortranslate_options = c(vectortranslate_options, "-f", "GPKG")
} else {
which_f = which(vectortranslate_options == "-f")
if (
which_f == length(vectortranslate_options) ||
vectortranslate_options[which_f + 1] != "GPKG"
) {
oe_stop(
.subclass = "oe_vectortranslate_shouldTranslateToGPKGOnly",
message = "The oe_vectortranslate function should translate to GPKG format only"
)
}
}
# Check if the user omitted the "-overwrite" option
if ("-overwrite" %!in% vectortranslate_options && all(c())) {
# Otherwise add the -overwrite option
vectortranslate_options = c(vectortranslate_options, "-overwrite")
}
# Check if the user set any open option
if ("-oo" %!in% vectortranslate_options) {
# Otherwise append the basic open options
vectortranslate_options = c(vectortranslate_options, "-oo", paste0("CONFIG_FILE=", osmconf_ini))
} else {
# Check if the user set its own CONFIG_FILE and osmconf_ini is not NULL.
# In that case, raise a warning message
if (any(grepl("CONFIG_FILE", vectortranslate_options)) && !is.null(osmconf_ini)) {
warning(
"The osmconf_ini argument is ignored since the CONFIG file ",
"was already specified in the vectortranslate options.",
call. = FALSE
)
}
}
# Check if the user set any layer creation option (lco)
if ("-lco" %!in% vectortranslate_options) {
# Otherwise append the basic layer creation options
vectortranslate_options = c(vectortranslate_options, "-lco", "GEOMETRY_NAME=geometry")
}
# Check if the user set the argument boundary
vectortranslate_options = process_boundary(vectortranslate_options, boundary, boundary_type)
# Check if the user added the layer argument
if (
!any(c("points", "lines", "multipolygons", "multilinestrings", "other_relations") %in% vectortranslate_options)
) {
# Otherwise append the layer
vectortranslate_options = c(vectortranslate_options, layer)
}
}
oe_message(
"Starting with the vectortranslate operations on the input file!",
quiet = quiet,
.subclass = "oe_vectortranslate_startVectortranslate"
)
# Now we can apply the vectortranslate operation from gdal_utils: See
# https://github.com/ropensci/osmextract/issues/150 for a discussion on
# normalizePath
sf::gdal_utils(
util = "vectortranslate",
source = normalizePath(file_path),
destination = normalizePath(gpkg_file_path, mustWork = FALSE),
options = vectortranslate_options,
quiet = quiet
)
oe_message(
"Finished the vectortranslate operations on the input file!",
quiet = quiet,
.subclass = "oe_vectortranslate_finishedVectortranslate"
)
# and return the path of the gpkg file
gpkg_file_path
}
get_id_layer = function(layer) {
default_id = list(
points = 38L,
lines = 58L,
multipolygons = 90L,
multilinestrings = 108L,
other_relations = 126L
)
default_id[[layer]]
}
get_fields_default = function(layer) {
def_layers = list(
points = c(
"name",
"barrier",
"highway",
"ref",
"address",
"is_in",
"place",
"man_made"
),
lines = c(
"name",
"highway",
"waterway",
"aerialway",
"barrier",
"man_made",
"railway"
),
multipolygons = c(
"name",
"type",
"aeroway",
"amenity",
"admin_level",
"barrier",
"boundary",
"building",
"craft",
"geological",
"historic",
"land_area",
"landuse",
"leisure",
"man_made",
"military",
"natural",
"office",
"place",
"shop",
"sport",
"tourism"
),
multilinestrings = c("name", "type"),
other_relations = c("name", "type")
)
def_layers[[layer]]
}
process_boundary = function(
vectortranslate_options,
boundary = NULL,
boundary_type = c("spat", "clipsrc")
) {
# Checks
if (is.null(boundary)) {
return(vectortranslate_options)
}
if (any(c("-spat", "-clipsrc") %in% vectortranslate_options)) {
warning(
"The boundary argument is ignored since the vectortraslate_options ",
"already defines a spatial filter",
call. = FALSE
)
return(vectortranslate_options)
}
# Match the boundary type
boundary_type = match.arg(boundary_type)
# Extract/convert the geometry (or just return the geometry if boundary is a
# sfc)
if (inherits(boundary, "bbox")) {
boundary = sf::st_as_sfc(boundary)
}
boundary = sf::st_geometry(boundary)
# Check the number of geometries
if (length(boundary) > 1L) {
warning(
"The boundary is composed by more than one features. Selecting the first. ",
call. = FALSE
)
boundary = boundary[1L]
}
# Check that the object can be interpreted as a POLYGON
stopifnot(sf::st_is(boundary, "POLYGON") || sf::st_is(boundary, "MULTIPOLYGON"))
# Check the CRS of boundary
if (sf::st_crs(boundary) != sf::st_crs(4326)) {
boundary = sf::st_transform(boundary, 4326)
}
# Try to fix the boundary in case it's not valid
if (! sf::st_is_valid(boundary)) {
boundary = sf::st_make_valid(boundary)
}
# Add and return
switch(
boundary_type,
spat = process_spat(vectortranslate_options, boundary),
clipsrc = process_clipsrc(vectortranslate_options, boundary)
)
}
# Add "-spat" + (xmin, ymin, xmax, ymax)
process_spat = function(vectortranslate_options, boundary) {
c(vectortranslate_options, "-spat", sf::st_bbox(boundary))
}
# Add "-clipsrc" + WKT
process_clipsrc = function(vectortranslate_options, boundary) {
c(vectortranslate_options, "-clipsrc", sf::st_as_text(boundary))
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.