In mapme-initiative/mapme.biodiversity: Efficient Monitoring of Global Biodiversity Portfolios

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(sf)
library(dplyr)
library(tidyr)
library(mapme.biodiversity)
mapme_options(verbose = FALSE)

Objectives

This tutorial gives you some information how to transform the output of the mapme-biodiversity package to a wide format for exchange with other (geospatial-)software, such as QGIS. This is necessary because the package uses the so-called nested-list format by default to represent indicators. However, this format is specific to R and to use the data in other software thus requires some additional steps to be taken. This vignette shows you how you can change the data layout of your portfolio that you can easily serialize to a spatial format of your choice and use with other software.

What are long vs. wide tables?

Tabular data can be structured in two different ways, which are usually referred to as long and wide format. Most people are more familiar with the wide format, because this is a format we as humans would naturally structure our data when we work with spreadsheets, e.g. in Excel. In wide-format, the identifier of a observation is included exactly once and does not repeat itself (see Table A). In the long format, the identifier as well as other qualifying variables, might be repeated several times to uniquely identify each observation in a single row (see Table B). The long format is often required when interacting with computers, e.g. to make plots with ggplot2. The content of the two is exactly the same either way, the one might be just more friendly to humans than to computers. If you are familiar with the R tidyverse, you might also have heard of the term tidy data. In terms of tabular data you can imagine tidy data to be referring to data in a long table which naturally fulfils the following requirements:

each variable has its own column
each observation has its own row
each value has its own cell

Fig. 1: Example of a wide and long table holding the same
data.

Table A, in that sense, is not tidy since the year variable is not found in its own column but instead it is scattered in two different columns. Table B is a long format with each variable being found in exactly one column. In that sense, each individual row represents exactly one observation, meaning the observation of a specific country in a specific year.

When we structure data in the long format the objects will usually have a larger memory footprint than compared to a wide format. For smaller objects or data types with small memory consumption, this might not pose a serious limitation to the workflow. However, geometry information, here indicated as a WKT string, might quickly accumulate a large proportion of the available memory, even more so if the portfolio consists of a high number of complex geometries that are then copied to fit the the long-format requirement. For that reason, this packages uses a nested-list format to hold tables for indicators as single columns within the portfolio. The remainder of this tutorial will show you in more detail how you can work with those R specific data format.

The simple case - single-row indicators

We start by reading a GeoPackage from disk. For the sake of the argument, we split the original single polygon into 9 distinct polygons to simulate a realistic portfolio consisting of multiple assets.

aoi <- read_sf(
  system.file("extdata", "sierra_de_neiba_478140_2.gpkg",
    package = "mapme.biodiversity"
  )
)
aoi <- st_as_sf(st_make_grid(aoi, n = 3))
print(aoi)

As a simple example, suppose we interested in extracting the average traveltime to cities between 20,000 and 50,000 inhabitants in our portfolio. As usual, we will have to make available the Nelson et al. resource as well as requesting the calculation of the respective indicator.

outdir <- file.path(tempdir(), "mapme-resources")
mapme.biodiversity:::.copy_resource_dir(outdir)

outdir <- file.path(tempdir(), "mapme-resources")
dir.create(outdir, showWarnings = FALSE)

mapme_options(
  outdir = outdir,
  verbose = FALSE
)

aoi <- get_resources(aoi, get_nelson_et_al(ranges = "100k_200k"))
aoi <- calc_indicators(aoi, calc_traveltime(stats = "mean"))
print(aoi)

As we can observe from the output, a new column was added to the sf object. It is called traveltime and is of type list indicating that it represents a nested-list column. What that means is that we are able to maintain the rectangular shape of our original data (e.g. one polygon per row), while supporting arbitrarily shaped outputs for indicators. Let's observe how the traveltime indicator looks like in this instance:

print(aoi$traveltime[[1]])

From the syntax above, you can see how you can access a single object within the nested list column (e.g. by using the list accessor [[). In this case, the shape of the traveltime indicator is a single-row and two-column tibble with the average minutes and the distance category as its value. We can now use either of two functions to transform the portfolio to long or wide formats:

portfolio_long(aoi)
portfolio_wide(aoi)

Both function will, by default, automatically detect nested-list columns and change the data layout. In this case, the result still has 9 rows just like the original data frame because the indicator traveltime consisted of just a single row per asset.

We could serialize the object to disk in either format by calling write_portfolio() with the respective format argument:

dsn_long <- tempfile(fileext = ".gpkg")
dsn_wide <- tempfile(fileext = ".gpkg")
write_portfolio(aoi, dsn_long, format = "long", quiet = TRUE)
write_portfolio(aoi, dsn_wide, format = "wide", quiet = TRUE)

The harder case - indicators with multi-row output

Let's continue to query an indicator which has a multi-row output, i.e. precipitation statistics from WorldClim.

aoi <- get_resources(aoi, get_worldclim_precipitation(years = 2018))
aoi <- calc_indicators(aoi, calc_precipitation_wc(stats = "mean"))
print(aoi)

We see that in addition to the traveltime indicator, we now obtained an additional nested-list column called precipitation_wc. Note, however, the differences in the shape of the indicator tibble when we take a closer look for a specific asset:

print(aoi$precipitation_wc[[1]])

For a single asset, we obtain a tibble with 12 rows (for each month in the queried year 2018). Now, let's have a look at what happens if we transform this table to long format, this time specifically requesting to only extract the precipitation_wc indicators:

portfolio_long(aoi, indicators = "precipitation_wc")

Instead of 9 rows, we get a tibble with 108 rows (9 assets * 12), with the metadata for each asset with the geometry column and other identifying values being repeated 12 times each. For very large portfolios, this data layout might be very memory intensive. In these cases it might be more favourable to transform the portfolio to a wide layout.

portfolio_wide(aoi, indicators = "precipitation_wc")

From the example output above we see that in this case, we obtain a resulting object with 9 rows only. The indicator data is now found in the respective columns which are named according to this schema:

<indicator-name>_<datetime>_<variable>_<unit>

with the values being found in the rows for each unique combination of this pattern.

Note, that the traveltime still is represented as a nested-list column. When serializing to disk, all present indicators are going to extracted in order to be able to serialize to spatial data formats. If it is not desired to include certain indicators you will have to subset the portfolio as indicated in the following code block:

dsn <- tempfile(fileext = ".gpkg")
write_portfolio(select(aoi, traveltime), dsn, quiet = TRUE)

mapme-initiative/mapme.biodiversity documentation built on April 5, 2025, 12:47 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mapme-initiative/mapme.biodiversity
Efficient Monitoring of Global Biodiversity Portfolios

In mapme-initiative/mapme.biodiversity: Efficient Monitoring of Global Biodiversity Portfolios

Objectives

What are long vs. wide tables?

The simple case - single-row indicators

The harder case - indicators with multi-row output

R Package Documentation

Browse R Packages

We want your feedback!

mapme-initiative/mapme.biodiversity Efficient Monitoring of Global Biodiversity Portfolios

In mapme-initiative/mapme.biodiversity: Efficient Monitoring of Global Biodiversity Portfolios

Objectives

What are long vs. wide tables?

The simple case - single-row indicators

The harder case - indicators with multi-row output

R Package Documentation

Browse R Packages

We want your feedback!

mapme-initiative/mapme.biodiversity
Efficient Monitoring of Global Biodiversity Portfolios