knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(sf) library(dplyr) library(tidyr) library(mapme.biodiversity) mapme_options(verbose = FALSE)
This tutorial gives you some information how to transform the output of the mapme-biodiversity package to a wide format for exchange with other (geospatial-)software, such as QGIS. This is necessary because the package uses the so-called nested-list format by default to represent indicators. However, this format is specific to R and to use the data in other software thus requires some additional steps to be taken. This vignette shows you how you can change the data layout of your portfolio that you can easily serialize to a spatial format of your choice and use with other software.
Tabular data can be structured in two different ways, which are usually referred to as long and wide format. Most people are more familiar with the wide format, because this is a format we as humans would naturally structure our data when we work with spreadsheets, e.g. in Excel. In wide-format, the identifier of a observation is included exactly once and does not repeat itself (see Table A). In the long format, the identifier as well as other qualifying variables, might be repeated several times to uniquely identify each observation in a single row (see Table B). The long format is often required when interacting with computers, e.g. to make plots with ggplot2. The content of the two is exactly the same either way, the one might be just more friendly to humans than to computers. If you are familiar with the R tidyverse, you might also have heard of the term tidy data. In terms of tabular data you can imagine tidy data to be referring to data in a long table which naturally fulfils the following requirements:
Table A, in that sense, is not tidy since the year variable is not found in its own column but instead it is scattered in two different columns. Table B is a long format with each variable being found in exactly one column. In that sense, each individual row represents exactly one observation, meaning the observation of a specific country in a specific year.
When we structure data in the long format the objects will usually have a larger memory footprint than compared to a wide format. For smaller objects or data types with small memory consumption, this might not pose a serious limitation to the workflow. However, geometry information, here indicated as a WKT string, might quickly accumulate a large proportion of the available memory, even more so if the portfolio consists of a high number of complex geometries that are then copied to fit the the long-format requirement. For that reason, this packages uses a nested-list format to hold tables for indicators as single columns within the portfolio. The remainder of this tutorial will show you in more detail how you can work with those R specific data format.
We start by reading a GeoPackage from disk. For the sake of the argument, we split the original single polygon into 9 distinct polygons to simulate a realistic portfolio consisting of multiple assets.
aoi <- read_sf( system.file("extdata", "sierra_de_neiba_478140_2.gpkg", package = "mapme.biodiversity" ) ) aoi <- st_as_sf(st_make_grid(aoi, n = 3)) print(aoi)
As a simple example, suppose we interested in extracting the average traveltime to cities between 20,000 and 50,000 inhabitants in our portfolio. As usual, we will have to make available the Nelson et al. resource as well as requesting the calculation of the respective indicator.
outdir <- file.path(tempdir(), "mapme-resources") mapme.biodiversity:::.copy_resource_dir(outdir)
outdir <- file.path(tempdir(), "mapme-resources") dir.create(outdir, showWarnings = FALSE) mapme_options( outdir = outdir, verbose = FALSE ) aoi <- get_resources(aoi, get_nelson_et_al(ranges = "100k_200k")) aoi <- calc_indicators(aoi, calc_traveltime(stats = "mean")) print(aoi)
As we can observe from the output, a new column was added to the sf
object. It is called traveltime
and is of type list
indicating
that it represents a nested-list column. What that means is that we
are able to maintain the rectangular shape of our original data
(e.g. one polygon per row), while supporting arbitrarily shaped
outputs for indicators. Let's observe how the traveltime
indicator
looks like in this instance:
print(aoi$traveltime[[1]])
From the syntax above, you can see how you can access a single object
within the nested list column (e.g. by using the list accessor [[
). In
this case, the shape of the traveltime
indicator is a single-row and
two-column tibble with the average minutes and the distance category as its value.
We can now use either of two functions to transform the portfolio to long
or wide formats:
portfolio_long(aoi) portfolio_wide(aoi)
Both function will, by default, automatically detect nested-list columns and
change the data layout. In this case, the result still has 9 rows just like the
original data frame because the indicator traveltime
consisted of just a single
row per asset.
We could serialize the object to disk in either format by calling write_portfolio()
with the respective format
argument:
dsn_long <- tempfile(fileext = ".gpkg") dsn_wide <- tempfile(fileext = ".gpkg") write_portfolio(aoi, dsn_long, format = "long", quiet = TRUE) write_portfolio(aoi, dsn_wide, format = "wide", quiet = TRUE)
Let's continue to query an indicator which has a multi-row output, i.e. precipitation statistics from WorldClim.
aoi <- get_resources(aoi, get_worldclim_precipitation(years = 2018)) aoi <- calc_indicators(aoi, calc_precipitation_wc(stats = "mean")) print(aoi)
We see that in addition to the traveltime
indicator, we now obtained an
additional nested-list column called precipitation_wc
.
Note, however, the differences in the shape of the indicator tibble when we
take a closer look for a specific asset:
print(aoi$precipitation_wc[[1]])
For a single asset, we obtain a tibble with 12 rows (for each month in
the queried year 2018). Now, let's have a look at what happens if we transform
this table to long format, this time specifically requesting to only extract
the precipitation_wc
indicators:
portfolio_long(aoi, indicators = "precipitation_wc")
Instead of 9 rows, we get a tibble with 108 rows (9 assets * 12), with the metadata for each asset with the geometry column and other identifying values being repeated 12 times each. For very large portfolios, this data layout might be very memory intensive. In these cases it might be more favourable to transform the portfolio to a wide layout.
portfolio_wide(aoi, indicators = "precipitation_wc")
From the example output above we see that in this case, we obtain a resulting object with 9 rows only. The indicator data is now found in the respective columns which are named according to this schema:
<indicator-name>_<datetime>_<variable>_<unit>
with the values being found in the rows for each unique combination of this pattern.
Note, that the traveltime
still is represented as a nested-list column. When
serializing to disk, all present indicators are going to extracted in order to
be able to serialize to spatial data formats. If it is not desired to include
certain indicators you will have to subset the portfolio as indicated
in the following code block:
dsn <- tempfile(fileext = ".gpkg") write_portfolio(select(aoi, traveltime), dsn, quiet = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.