In ku-awdc/HexScape: Aggregation of Spatial Data into Discrete Patches using Simple Features

knitr::opts_chunk$set(echo = TRUE)
library("pbapply")
pboptions(type="txt")

Introduction

The hexscape package has a number of uses (some of which are not yet implemented):

Facilitate use of vectorised spatial data for EU/EEA countries, and/or regions/areas (as defined by NUTS1/2/3) of these countries
Facilitate use of additional vectorised spatial data on parishes, kommune and postcodes for Denmark
Subdivision of the above spatial data into `patches' based on either polygons of arbitrary size, or Voronoi tessellation around points of interest supplied by the user
Facilitate use of Corine land usage data for EU/EEA countries (or regions/areas thereof) and aggregation of different land cover types within the spatial areas and/or patches defined above
Facilitate generating networks of distances and/or adjacencies for these polygons
Facilitating sampling of random points based on landscapes and/or real points provided

The data comes from a combination of Eurostat, Corine, and other sources - all of which is freely available online. Much of the internal functionality is provided by the sf package.

Installation

The package isn't on CRAN (yet?), but the most recent stable (ish) version can be installed from our drat repository using:

install.packages("hexscape", repos=c(CRAN = "https://cran.rstudio.com/", 
                            `ku-awdc` = "https://ku-awdc.github.io/drat/"))

Or, for the most recent (but even less stable) version you can install from GitHub:

# remotes::install_github("ku-awdc/hexscape")

Then you should be able to load the package:

library("hexscape")

Note that this also currently loads tidyverse, as that is what I used to develop the package, and I was too lazy to import everything properly. At some point, this will change, so it is probably best to explicitly load tidyverse and sf (due to the heavy reliance on sf data frames) as well:

library("tidyverse")
library("sf")

(Unless of course you don't use tidyverse, in which case you should probably read https://r4ds.had.co.nz/index.html a couple more times)

Setup

The hexscape package uses local caching of intermediate data files, as most of the spatial operations take quite a long time to process from the raw data. So, the first step is to create a folder somewhere on your hard drive (NOT a network storage drive) to/from which files can be saved/loaded:

path_to_folder <- file.path(tempdir(check=TRUE), "hexscape_usage")
dir.create(path_to_folder)

path_to_folder <- "/some/path/to/a/folder"

Then run the following code:

set_storage_folder(path_to_folder)

You should now see that hexscape has created some subfolders:

list.files(path_to_folder)

This storage folder needs to be set every time you load hexscape. To avoid having to do this manually, you can put the following code in your .Rprofile:

## To open your .Rprofile try:
usethis::edit_r_profile()

## Then add the following to the bottom of that file:
Sys.setenv("HEXSCAPE_STORAGE"="/some/path/to/a/folder")
# [obviously replacing /some/path/to/a/folder with your path]

That way, the storage folder will be set for you automatically when hexscape is loaded.

set_storage_folder(Sys.getenv("HEXSCAPE_STORAGE"))
unlink(path_to_folder, recursive=TRUE)

NUTS

The hexscape package makes heavy use of NUTS (Nomenclature of territorial units for statistics; 2021 version) to define spatial areas. These are defined at levels 0 (national) to 3 (smallest scale), with the number of NUTS units depending on the country. Some datasets are inbuilt to help with cross-referencing:

nuts_codes
nuts_codes |> filter(Code=="DK")

There is also a list of countries indicating whether or not that country has Eurostat spatial information available (and can therefore be used with the hexscape package):

country_codes

Spatial data

Eurostat data

We use the spatial data provided by Eurostat to define the boundaries of NUTS areas. This can be downloaded from: https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts#nuts21

You need to select the following options:

Year = NUTS 2021
File format = SHP
Geometry type = Polygons (RG)
Scale = 01M
CRS = EPSG: 3035

Then hit the download button and extract the resulting ZIP file. You should end up with a 29.2 MB folder called "NUTS_RG_01M_2021_3035.shp". Put this folder inside the "raw_data" folder of the hexscape storage folder you created above.

You should then be able to load a map for any NUTS area at any level, for example:

dk <- load_map("DK")
dk
ggplot(dk) + geom_sf() + theme_void()

Or for a sub-region of a country:

sj <- load_map("DK032")
ggplot(sj) + geom_sf() + theme_void()

The first time the maps for a specific country are fetched might take a couple of seconds to process from the raw data; subsequent loads will be from a country-specific cache (almost instantaneously).

Corine data

The hexscape package uses Corine Land Cover (CLC) data as the raw underlying data for land usage. To use this, you need to download the SQLite Database from here: https://land.copernicus.eu/pan-european/corine-land-cover/clc2018?tab=download

You will first need to register an account and log in, but it is free to do so.

Once the file has downloaded (beware: it is 3.5GB) you can un-zip it and hopefully end up with a folder named u2018_clc2018_v2020_20u1_geoPackage - move (or copy/symlink if you prefer) this folder inside the raw_data folder that you set up above. Make sure to keep the name of the folder exactly as it is above.

The CLC data classifies land usage under a number of different categories, grouped at 3 different levels. To see what codes are defined use the following lookup table:

clc_codes

The final column defines a suitable colour code suggested for plotting.

Combining Corine and Eurostat data

The raw Corine data contains data on all polygons for all countries (i.e. it is vector rather than raster). In order to use it, we need to process the raw data to intersect with the specific NUTS1 area(s) we are interested in, and simplify the polygons slightly (for easier processing and plotting downstream). This is done automatically the first time you request a map, then the data is cached and re-used for subsequent requests of the same NUTS1 area. Simplification of polygons from different CLC codes is done ensuring that the total area remains the same, i.e. none of the simplified CLC polygons overlap, and the returned result is still a valid sf object. This all takes some time to process... For example, caching data for Denmark (NUTS1 code DK0) takes around 3 mins to run on my arm64 laptop (and more than double that on my ageing Xeon desktop):

dk_corine <- load_corine("DK0")

But subsequent requests for any data from within Denmark is instantaneous, for example:

dk032_corine <- load_corine("DK032")
ggplot(dk032_corine, aes(fill=CLC)) + geom_sf(lwd=0, colour=NA) + theme_void() + theme(legend.pos="none") + scale_fill_manual(values=clc_codes$CLC_RGB)
## TODO: implement an autoplot method

The default return value is a modified sf data frame, with 1 row per combination of CLC and NUTS3 area code in the region requested, and columns giving the CLC code and descriptions, country code and name, NUTS3 code, total area of the CLC type in the original data (in km²), total area of the CLC type in the simplified data, and the geometry. If you prefer a single row per CLC type for the entire area, then you can specify union=TRUE to load_corine().

The number of NUTS1 codes per country varies from 1 (around half of the countries) up to 16 (Germany), and each NUTS1 area is cached separately. If you request data for a whole country, this might mean it takes a long time to process all of the relevant NUTS1 areas.

If you know that you need to process a lot of different NUTS1 areas then you can set use_cache=TRUE for load_corine(); this does some pre-processing of the entire raw Corine database which takes around 5 minutes the first time, but speeds up subsequent calls to load_corine (for any NUTS1 area). The downside is that this uses around 5GB of extra hard drive space to store the cached intermediate files. Or you can ask me nicely for a specific set NUTS1 areas and I will send the pre-processed cache files to you ... but please do register and download the Corine database yourself so that I am not violating their usage policies.

Summarising processed corine data

The load_corine() function provides spatial data for every CLC code separately, but you may want to filter and/or combine them. For now you have to do that manually - for example if we want to only extract farmland:

dk032_corine |>
  filter(CLC_Label1 == "Agricultural areas") |>
  summarise(Area = sum(Area), geometry = st_union(geometry)) ->
dk032_farmland

ggplot() +
  geom_sf(data=load_map("DK032")) +
  geom_sf(data=dk032_farmland, fill="dark green") +
  theme_void()

TODO: At some point I will probably write a helper function e.g. aggregate_corine() for this...

Danish-specific data

TODO: incorporate code for kommune, sogne, and postcode shape files from Covid19tools package

Future TODO: maybe include other info (e.g. population densities) from Danmark's Statistic via their API (or using https://github.com/rOpenGov/dkstat ??)

Discretising the spatial data

Voronoi tesselation

One option to discretise the spatial data is using Voronoi tesselation around a given number of spatial points. In real life this could be farm locations; let's simulate N=100 farms within the farmland of south Jutland:

N <- 100L
tibble(Index = 1:N, point=st_sample(dk032_farmland, N)) |>
  st_as_sf(sf_column_name="point") ->
fake_farms

ggplot() +
  geom_sf(data=load_map("DK032")) +
  geom_sf(data=dk032_farmland, fill="dark green", alpha=0.5) +
  geom_sf(data=fake_farms, col="dark green") +
  theme_void()

Then we can produce Voronoi tesselated polygons within this farmland as follows (where the first argument is the spatial mask for the area of interest, and the second is the points defining the Voronoi points):

dk032_voronoi <- discretise_voronoi(dk032_farmland, fake_farms)

This returns the original data frame, plus new columns for Area, centroid (of the Voronoi tesselated area after intersecting with the map), and geometry.

dk032_voronoi

These can be plotted to show the expected discrepancy between original farm location and centroid e.g.:

ggplot() +
  geom_sf(data=load_map("DK032")) +
  geom_sf(data=dk032_farmland, fill="dark green", alpha=0.5) +
  geom_sf(data=dk032_voronoi, fill="transparent", col="red") +
  geom_sf(data=dk032_voronoi, aes(geometry=centroid), col="dark green") +
  geom_sf(data=fake_farms, col="red") +
  theme_void()

Discretisation into hexagons

TODO

Landscape-oriented discretisation

TODO

Summarising polygons

Summarising content by CLC

TODO

Generating distances and adjacency graphs

TODO

Utility functions

A number of helper/utility functions are provided (mostly wrapping functionality in sf), some of which are described below.

Sampling random points

To easily sample random points within a data frame containing polygons (either Voronoi tesselations or hexagons or anything else), you can use the sample_points function. This takes 2 arguments: the first is an sf data frame with 1 or more rows and a column giving the Index, and the second is the number of points to sample within each row.

random_points <- sample_points(dk032_voronoi, size=5L, verbose=0L)
random_points

The points are generated by rejection sampling over a bounding box that iteratively decreases in size once enough points have been found for each individual polygon. To plot the points:

ggplot() +
  geom_sf(data=load_map("DK032")) +
  geom_sf(data=dk032_farmland, fill="dark green", alpha=0.5) +
  geom_sf(data=dk032_voronoi, fill="transparent", col="red") +
  geom_sf(data=random_points, col="black", size=0.3) +
  theme_void()

In this case, each farmland polygon should contain exactly 5 random points.

TODO: add a buffer argument for sample_points so that we can ensure that each sampled point is a minimum distance from all other sampled points (perhaps as a proportion of the observed minimum distance in the provided points)

Randomisation of spatial points

A utility function is also provided to randomly re-place each provided point into a new location based on Voronoi tesselation. For example, starting with the fake farms and Corine-based farmland that we had before, we can ask for a new spatial location for each farm somewhere within the nearest 5 Voronoi tesselated spatial areas like so:

random_farms <- randomise_voronoi(dk032_farmland, fake_farms, randomise_size=5L, additional_info=TRUE, verbose=0L)

The output is the same as the input data frame, except that we have an additional column RandomPoint that contains the new spatial location:

random_farms

This is done in a way that attempts to preserve farm density by picking from the same number of pre-sampled points per Voronoi tesselation unit (default 10), which reduces the chances of e.g. placing all new points within the same large area surrounding a single farm.

We can see how far each farm has moved using the following code:

ggplot() +
  geom_sf(data=load_map("DK032")) +
  geom_sf(data=dk032_farmland, fill="dark green", alpha=0.5) +
  geom_sf(aes(geometry=VoronoiMasked), random_farms, col="dark green", alpha=0.1) +
  geom_sf(aes(geometry=VoronoiCell), random_farms, col="dark blue", alpha=0.1) +
  geom_sf(aes(geometry=point), random_farms, col="red") +
  geom_sf(aes(geometry=RandomPoint), random_farms, col="green") +
  geom_sf(aes(geometry=RandomShift), random_farms) +
  theme_void()

Increasing the randomise_size argument and re-running the code should result in farms moving on average further:

random_farms <- randomise_voronoi(dk032_farmland, fake_farms, randomise_size=25L, verbose=0L)

ggplot() +
  geom_sf(data=load_map("DK032")) +
  geom_sf(data=dk032_farmland, fill="dark green", alpha=0.5) +
  geom_sf(aes(geometry=point), random_farms, col="red") +
  geom_sf(aes(geometry=RandomPoint), random_farms, col="green") +
  geom_sf(aes(geometry=RandomShift), random_farms) +
  theme_void()

TODO: when adding the buffer argument for sample_points make sure this is passed through from randomise_voronoi as well

Footnotes

The package is still under active development and is subject to some change, however we will try to keep the interfaces discussed above consistent. In particular, we acknowledge that the help files are fairly non-existent at the moment - sorry - but these will be updated ASAP. When new release versions are uploaded to drat, this guide will be updated with the new features.

At some point soon ish we will be summarising the main features of this package as part of a paper (linked to both Mossa's PhD and DigiVet). In the meantime, comments and suggestions are very welcome!

sessionInfo()

ku-awdc/HexScape documentation built on Jan. 10, 2025, 5:24 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ku-awdc/HexScape
Aggregation of Spatial Data into Discrete Patches using Simple Features

In ku-awdc/HexScape: Aggregation of Spatial Data into Discrete Patches using Simple Features

Introduction

Installation

Setup

NUTS

Spatial data

Eurostat data

Corine data

Combining Corine and Eurostat data

Summarising processed corine data

Danish-specific data

Discretising the spatial data

Voronoi tesselation

Discretisation into hexagons

Landscape-oriented discretisation

Summarising polygons

Summarising content by CLC

Generating distances and adjacency graphs

Utility functions

Sampling random points

Randomisation of spatial points

Footnotes

R Package Documentation

Browse R Packages

We want your feedback!

ku-awdc/HexScape Aggregation of Spatial Data into Discrete Patches using Simple Features

In ku-awdc/HexScape: Aggregation of Spatial Data into Discrete Patches using Simple Features

Introduction

Installation

Setup

NUTS

Spatial data

Eurostat data

Corine data

Combining Corine and Eurostat data

Summarising processed corine data

Danish-specific data

Discretising the spatial data

Voronoi tesselation

Discretisation into hexagons

Landscape-oriented discretisation

Summarising polygons

Summarising content by CLC

Generating distances and adjacency graphs

Utility functions

Sampling random points

Randomisation of spatial points

Footnotes

R Package Documentation

Browse R Packages

We want your feedback!

ku-awdc/HexScape
Aggregation of Spatial Data into Discrete Patches using Simple Features