knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
If you are reading this vignette you are most probably to contribute to
the {mapme.biodiversity}
package. This is great news and we are very
happy to receive Pull-Requests extending the package's functionality!
Below you will find important in-depth information about how to add
resources and indicators to make the process as seamless as possible for
both you and the package's maintainers. Please make sure to read and
understand this guide before opening a PR. If in doubt, especially if
you feel that the framework does not support your use case, always feel
free to raise an issue and we will happily discuss how we can support
your ideas. If you have not already done so, make sure to read the
Terminology vignette to get familiar with the most
important concepts of this package.
Note that we use the tidyverse style
guide for the package. That specifically
means that function and variable names should follow the snake case
pattern. We also use the arrow assignment operator (<-
). When
submitting a PR that does not consistently follow the tidyverse style
guide, the maintainers of the package might change the code to adhere to
this code style without further notice before accepting the PR.
Ideally, you clone the GitHub repository via the git command in a command line on Linux and MacOS systems or via the GitHub Desktop application on Windows. On Linux, the command would look like this:
git clone https://github.com/mapme-initiative/mapme.biodiversity
We do not accept pushes to main, thus the first step would be to create
a specific branch for your extension. In this tutorial, we will pretend
to re-implement the nasa_srtm
resource and the associated elevation
indicator, so we will create a branch reflecting this. Don't forget to
check out to the newly created branch!
```{bash, eval = FALSE} git branch add-elevation git checkout add-elevation
Below, we will assume that you develop your extension to the package in RStudio. The general guidelines to follow also apply if you choose different tooling for your development process, however, it will not be covered in this vignette. We assume that all R development dependencies are installed. The easiest way to ensure this is using `{devtools}`: ```r devtools::install_dev_deps()
R/get_<resource_name>.R
)make_footprints()
for resources
matching the portfoliotest/testthat/test-get_<resource_name>.R
inst/res/<resource_name>
data-raw
register_resource()
A resource is a supported dataset that can be made available from a
user's perspective by specifying one or more functions to
get_resources()
. Currently, the package supports only raster and
vector resources. If you wish to submit the support of a new resource,
please be aware that we will only accept new resources if they are
associated with at least one indicator calculation. The very first step
to adding a resource is to create a new file that will be holding the
required code.\
Once you checked out to the new branch and having the project opened in
RStudio, adapt the following command to open the a new resource file:
file.edit("R/get_<your-new-resource>.R") # e.g. file.edit("R/get_soildgrids.R")
In the first part of a resource function, make sure to include detailed documentation. This documentation should explain what this resource represents, where it comes from (including a citation), and the user-facing arguments that should be specified during runtime.
Importantly, this documentation MUST receive the roxygen tag
@keywords resource
, so that the documentation will be identified as a
resource. Also, add the bare name of the resource as the @name
tag
(e.g. in the case of our example that translates to @name nasa_srtm
).
#' Short title #' #' One or more description paragraphs might follow here. Please describe #' the spatio-temporal structure of your resource here briefly. #' #' @name <the short name of your resource> #' @param <any user-facing arguments> #' @keywords resource <identifies the documentation as a resource> #' @references <ideally a citable scientific publication> #' @source <a link in the \url{} tag linking to an online documentation> #' @returns A function that makes a resource available for a portfolio #' @include register.R #' @export
The last two tags are important to add as well. The include statement is mandatory for the register functionality (more on that below) to be loaded before your resource function. The export tag is important so that the resource is actually exposed to the users of the package.
Resource functions are constructed as closures, i.e. functions that return a function. The outer level exposes arguments to be set by users of the function to fine-control the flow of the function. Note, it is important to check the user input in this outer level for correctness so that warning/error messages in case of any miss specifications are thrown immediately.
For nasa_srtm
, this outer level does not look really exciting because
there are no user-facing arguments to be checked (we will see how to
check user-facing arguments when constructing the indicator below):
get_nasa_srtm <- function() { # .... inner function level }
Note, that there are some exported helper functions for re-occurring
argument checks that you are free to use (e.g. check_available_years()
in case you query the user for a temporal time frame). The arguments
defined in the outer level of the resource function are then ready to be
used in the inner level, that we will have a look at next.
The inner level of a resource function has a mandatory function
signature that will be checked during run-time. Your function is
required to exactly specify the below signature. For the nasa_srtm
resource, this looks like this:
function(x, name = "nasa_srtm", type = "raster", outdir = mapme_options()[["outdir"]], verbose = mapme_options()[["verbose"]]) { # ... function body }
The x
argument here represents the portfolio object handed over by the
user when calling get_resources()
which is an sf
-object and can thus
be used to derive the spatial extent of the portfolio. Next, comes the
name and the type of the resource which is required for the backend to
correctly handle the output and log the resource once it has been made
available.
The other arguments should default to the their respective output values
of mapme_options()
and represent a character vector for the output
directory, a logical to control the verbosity. Note, that the output
directory might be NULL
in case a user wishes to access data directly
from a remote source. We will look into how these things come together
now as we peak into constructing the actual body of a resource function.
The expected output of a resource function is an sf
-object with
geometries representing the bounding box for single resource elements
(e.g. tiles in case of a raster resource). We provide you with
functionality to seamlessly produce such a footprint object via
make_footprints()
. Note, this function either excepts a character
vector to GDAL readable data sources or an sf
object. In case you
provide a character vector, the bounding box information will be
retrieved automatically. This comes with a performance penalty for
remote sources, because each file has to be opened once, so always opt
for constructing an sf
object in the resource function, if feasible.
The footprints sf
object is expected to contain a column called
source
that points to a GDAL readable data source and the geometries
to correspond to the bounding box of each single resource element. You
might opt to supply a filename
argument to make_footprints()
. This
defaults to basename(srcs[["source]])
, so you might supply a better
suited filename, i.e. in the case the source location ends with an API
key value, or similar.
Next to specifying if the the resource you wish to turn into a footprint
object represents a raster or vector resource, you can specify opening
and creation options. Opening options are relevant if opening a remote
source requires location or driver dependent options (e.g. specifying
non-standard columns names for longitude/latitude with the CSV driver).
Creation options are relevant if a user specified a outdir
argument
and and refer to the arguments used by
gdal_translate
. These
should specifically specify data type and compression algorithms for
raster resources, but you are otherwise free to optimize the data layout
for efficient access.
Both oo
and co
can be specified as a single character vector, in
which case the same options are applied to all elements of a resource,
or as a list in order to accommodate file specific options.
Use the verbose
argument to decide if informative messages should be
printed, e.g. to inform users about download progress. Errors or
warnings should be emitted in either case.
If there is no intersection between the x
object and your resource,
make sure to return NULL
as early as possible.
We ask you to provide a small subset of your resource to
inst/res/resource_name
so that indicators that depend on the resource
can be tested without the need to actual download the resource.
Because there are some restrictions to the final size of the package, we
ask you to put substantial effort in reducing the size of the files to a
minimum. This includes cropping all resource samples to the spatial
extent of the polygon provided in
inst/extdata/sierra_de_neibe_478140.gpkg
or a polygon of similar size
supplied by you in case it the spatial extent does not intersect with
your resource.
For raster resources, if the original raster is encoded as float, consider changing the data type to integer by introducing a scale factor. Also, please use a compression algorithm to further reduce the file size. For vector resources, consider reducing the number of vertices in case the geometries are very complex.
Finally, put your processing script of the resource into data-raw
to
ensure reproducibility. Then, you are required to write a unit-test for
your resource function, which should execute as much of your code as
possible without actually conducting a download.
Note, that a resource SHALL NOT add additional dependencies to the package. If you add dependencies we require you to add a supporting statement to your PR explaining why these dependencies are needed and why other approaches would fail. Before accepting your PR, we might request you to change your code to remove these dependencies, if it is feasible to achieve the same functionality without.
The process of adding an indicator is very similar to the one for resources. However, some input-output requirements are different. Note, that in case that you added a new resource we also expect a new indicator taking advantage of that resource in your PR.
As you will see, there are two new important concepts to have in mind when adding an indicator. These are the processing mode and computational engines. We will briefly explain these concepts below, however, you can also head over to the Terminology vignette if you are interested in a more comprehensive definition of these two terms.
R/calc_<indicator_name>.R
)test/testthat/test-calc_<indicator_name>.R
An indicator is a logical routine depending on one or more resources
that extracts numeric outputs for all assets in a portfolio. From a
user's perspective, indicators are processed via the calc_indicators()
function. You as a developer will have to construct an indicator
function as a closure, e.g. a function that returns another function.
The outer level exposes user-facing arguments and checks that they are
correctly specified, while the inner level is required to follow a
specified signature and returns a tibble.
Once you checked out to the new branch and having the project opened in RStudio, adapt the following command to open the a new indicator file:
file.edit("R/calc_<your-new-indicator>.R") # e.g. file.edit("R/calc_precipitation")
In the first part of an indicator function, make sure to include
detailed documentation. This documentation should explain which
resources are required to calculate the indicator, the user-facing
arguments that should be specified during runtime and the structure of
the output tibble. Importantly, this documentation MUST receive the
roxygen tag @keywords indicator
, so that the documentation will be
identified as an indicator. Also, add the bare name of the indicator as
the @name
tag (e.g. @name elevation
).
#' Short title #' #' One or more description paragraphs might follow here. Please describe #' required resource and user arguments here. #' Please document which processing engines are available for your indicator #' and briefly describe how the indicator is derived from its inputs. #' #' @name <the short name of your indicator, same as in the backlog> #' @param <any user-facing arguments> #' @keywords indicator <identifies the documentation as an indicator> #' @returns A function that calculates an indicator for a portfolio #' @include register.R #' @export
The last two tags are important to add as well. The include statement is mandatory for the register functionality (more on that below) to be loaded before your indicator function. The export tag is important so that the resource is actually exposed to the users of the package.
Indicator functions are constructed as closures, i.e. functions that return a function. The outer level exposes arguments to be set by users of the function to fine-control the flow of the function. Note, it is important to check the user input in this outer level for correctness so that warning/error messages in case of any miss specifications are thrown immediately.
For elevation
, this outer level could look something like this:
calc_elevation <- function(engine = "extract", stats = "mean") { engine <- check_engine(engine) stats <- check_stats(stats) # ... inner function level }
There are some exported helper functions for re-occurring argument
checks that you are free to use (e.g. check_engine()
). Note, that the
arguments defined this way in the outer level of the indicator function
are then ready to be used in the inner level that we will have a look at
next.
The inner level of an indicator function has a mandatory function
signature that will be checked during run-time. Your function is
required to exactly specify the below signature. For the elevation
indicator, this looks like this:
function(x, nasa_srtm = NULL, name = "elevation", mode = "asset", aggregation = "stat", verbose = mapme_options()[["verbose"]]) { # ... function body }
The x
argument here represents the portfolio object handed over by the
user when calling get_resources()
which is an sf
-object with
'POLYGON'
as features. Next, comes the name(s) of the required
resource(s) and the name of the indicator. What follows is the
computation mode, that must be one of "asset"
or "portfolio"
.
We realized, that for large (potentially global) portfolios, depending on the spatial resolution of a resource, different processing modes substantially impact the time needed for a computation. For high to medium resolution raster resources, processing on the asset level benefits computation time. However, spatially cropping coarse resolution datasets for a high number of assets introduces significant overhead, thus processing these resources on a portfolio level is more efficient.
If neither of the two processing modes lead to satisfactory processing times for your indicator, please leave an issue/comment to discuss the addition of another processing mode with the maintainers of the package.
The argument aggregation
governs how chunked results for large
polygons are to be combined into a single indicator. In case uses supply
polygons larger than the specified code path or assets of type
MULTIPOLYGON
, a code path is triggered that splits assets into
sub-components. The aggregation method specifies which statistic is used
to combine values that share the same values in the remaining indicator
columns (i.e. datetime
, variable
, and unit
). The stat
keyword is
a special keyword used for indicators where statistics are specified by
the user and will trigger to select the respective statistic as the
aggregation statistic (e.g. take the sum of sums).
The available statistics are:
mapme.biodiversity:::available_stats
The argument verbose
defaults to the corresponding package-wide option
and should control the verbosity of you indicator function.
The expected output of an indicator function is a tibble. Depending on
the mode
specified for processing, it is a single tibble for
mode = "asset"
, or a list of tibbles equal to the rows of x
in case
mode = "portfolio"
.
You may use helper functions provided by the package for a common
interface e.g. for vector-raster zonal statistics (e.g. by using
select_engine()
).
You are encouraged to write your own helper function that are needed for your indicator processor. These should be located in the same file as the main processor, start with a dot and should not be exported.
If you wish to include roxygen documentation for your helpers, make sure
to add the @keywords internal
and @noRd
tags to your functions. If
you feel that one or more of your helper functions would be of benefit
to more that just one indicator, please comment in and
issue/pull-request to discuss with the package maintainers if your
helper function could be moved to R/utils.R
.
Use the verbose
argument to decide if informative messages should be
printed, e.g. to inform users about processing progress. Errors or
warnings should be emitted in either case.
If there is no intersection between the x
object and the required
resources, or for any other reason why your indicator might not be
calculated with the given configuration, make sure to return NA
as
early as possible.
You are required to add unit tests for your indicator using the package internal example data sets for resources. Make sure to properly test for miss-specification of user-facing arguments and also check for the correctness of numerical results of your indicator.
You might not need to construct a portfolio from scratch to test you
indicator function. Instead, you can directly call the returned function
on an appropriate polygon with the respective required resource. For the
elevation
indicator, this looks like this:
x <- read_sf(system.file( "extdata", "sierra_de_neiba_478140.gpkg", package = "mapme.biodiversity" )) nasa_srtm <- list.files( system.file( "res", "nasa_srtm", package = "mapme.biodiversity" ), pattern = ".tif$", full.names = TRUE ) nasa_srtm <- rast(nasa_srtm) ce <- calc_elevation(stats = c("mean", "median", "sd")) result_multi_stat <- ce(shp, nasa_srtm) expect_equal( names(result_multi_stat), c("elevation_mean", "elevation_median", "elevation_sd") )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.