In michielvandijk/mapspamc: An R package to create crop distribution maps

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Use of GAMS to solve the `mapspamc` models

The mapspamc package depends on General Algebraic Modeling System (GAMS) [@GAMSDevelopmentCorporation2019] to solve the cross-entropy and fitness score models. GAMS is commercial software, designed for modelling and solving large complex optimization problems. To be able to solve the models in mapspamc a GAMS licence is required that includes full access (without size limitations) to CPLEX and IPOPT solvers (but see below for potential alternative solvers). The function run_mapspamc activates GAMS to run the model as defined by the user in the model setup step. The GAMS model log will be sent to the screen after the model run has finished. The log is a text file, which names starts with model_log_ and is saved in the processed_data/intermediate_output folder after the model has finished. The GAMS code (gms files) is stored in the gams folder in the mapspamc R library folder. Interested users might want to take a look and, if necessary, modify the code and run it directly in GAMS, separately from the mapspamc package.

Use of slack when solving models

The GAMS code involves the use of a slack variable to deal with data inconsistencies, which result in infeasible models. A slack variable transforms an inequality constraint in an equality constraint by adding 'slack' [@Boyd2004a]. This makes it possible to solve the model even though it is technically infeasible. Different weights (i.e. penalties) are used to capture the interaction and trade-offs between the different slack variables (e.g. a lower slack for irrigated area may result in a higher slack for available cropland). Relatively high weights are used to minimize the slack for cropland availability and irrigated area, whereas low weights are used for the subnational statistics, which values are considered less reliable (see GAMS gms files for which weights are used). After the model is run, the slack variables are saved in the model output gdx file, which is stored in the processed_data/intermediate_output folder.

Excessive slack points at inconsistencies between the (sub)national statistics and the available crop and irrigated area to which the statistics are allocated. To deal with these inconsistencies, the first step is to closely inspect the data, and in particular the subnational crop statistics, for data entry errors (e.g. missing values) and, if needed, correct them. If there are still problems, we recommend by adjusting the subnational statistics as they are sometimes inconsistent with the FAOSTAT national statistics.

GAMS solvers

Depending on the licence, GAMS is installed with several solvers. For each type of problem a default solver is pre-selected. Unless changed by the user, the run_mapspamc function will use the GAMS default solvers for linear-problems (i.e. the max_score model) and non-linear problems (i.e. the min_entropy model) to solve the models. To find out which solvers are available and which are the default, open the GAMS IDE: file -> options -> solvers. The user has the option to select one of the other linear- and non-linear solvers supported by GAMS: ANTIGONE, BARON, CPC, CPLEX, CONOPT4, CONOPT, GUROBI, IPOPT, IPOPTH, KNITRO, LGO, LINDO, LOCALSOLVER, MINOS, MOSEK, MSNLP, OSICPLEX, OSIGUROBI, OSIMOSEK, OSIXPRESS, PATHNLP, SCIP, SNOPT, SOPLEX, XA, XPRESS.

For the max_score model, which is a linear problem, it is recommended to use CPLEX. For non-linear problems, such as the min_entropy model, is not possible to predict at forehand, which solver performs best. It is recommended to start with using the IPOPT, which has shown good performance in solving cross-entropy models. An alternative option is CONOPT4, which, however, is often much slower, and in some cases is not able to solve the model.

Differences between `mapspamc` and SPAM

There are several differences between mapspamc and SPAM [@Yu2020] related to the allocation algorithm, calculation of priors and the input data (Table \@ref(tab:tab-spam-dif)). A main innovation is that apart from the SPAM cross-entropy model, mapspamc also provides the option to use an alternative 'fitness score' model. This model was developed to create crop distribution maps at a higher resolution of 30 arc seconds [@VanDijk2022b]. As a consequence of using priors based on socio-economic and biophysical suitability measures, the cross-entropy approach allocates relatively small shares of a large number of crops and farmings systems to grid cells. This is plausible for a 5 arc minute resolution when grid cells are relatively large (\~ 10x10 km) and a large diversity of crops and farmings systems is to be expected. This is, however, less likely for high resolutions of 30 arc seconds, where grid cells are much smaller (\~ 1x1 km) and it is more likely to observe clusters of grid cells that are populated by a small number of crops and production systems. To simulate this process, a 'fitness' score between 0 and 100, which measures both the socio-economic and biophysical suitability, is calculated for each grid cell. Crops and production systems will be allocated in such a way that the crop area weighted fitness score is maximized, subject to subnational crop area information and availability of cropland and irrigated area.

Another difference related to the allocation algorithm is the use of slacks in the constraints to better deal with data inconsistencies (see above). We also dropped the suitability constraint in mapspamc. In SPAM, the suitable crop area, calculated as the biophyiscal suitability times the cropland area in a grid cell, was used as a hard constraint. In practice, this constraint was often dropped because it severely limits the space to allocate crops, resulting in infeasible models. As an alternative, the cross-entropy and fitness score models in mapspam use the suitability information to inform the priors and fitness scores, in particular for the subsistence and low-input production systems. We also slightly modified the calculation of the high-input and irrigated crops, which are now based on the geometric average of accessibility and potential revenue indicators that are normalized by means of the min-max method.

Finally, we updated several input data sources with more recent and higher resolution products, in particular (a) accessibility, which is now based on travel time [@Weiss2018], (b) population, taken from @Tatem2017, (c) urban extent taken from @Schiavina2022, (d) irrigated area, which is a synergy product, based on GIA [@Siebert2013] and GMIA [@Meier2018], and (e) cropland, taken from several recent cropland products that can be combined to generate a synergy cropland map in the pre-processing step. We also reduced the number of standard crops from 42 in SPAM to 40 in mapspamc by merging the two millet (pearl and small millet) and coffee (arabica and robusta) species because statistical information is difficult to find at such detailed crop level (see below).

As a consequence of these modifications, the maps created with mapspamc will deviate from comparable information presented by SPAM. Nonetheless, output is expected to be comparable if similar model settings are used.

library(readxl)
library(kableExtra)
spamdif <- read_excel("tables_chart/tables.xlsx", sheet = "tab_spam_comparison")

spamdif %>%
  dplyr::select(item, SPAM, mapspamc) %>%
  kable(., booktabs = TRUE, linesep = "", caption = "Differences between `mapspamc` and SPAM. SPAM refers to the latest version, which was used to generate SPAM2010 [@Yu2020]") %>%
  kable_styling(latex_options = c("hold_position"), font_size = 9) %>%
  pack_rows("Allocation algorithm", 1, 4) %>%
  pack_rows("Calculation of priors", 5, 8) %>%
  pack_rows("Data", 9, 14)
# %>%
#    column_spec(c(1,2,3),width = c("4cm", "5cm", "5cm"))

List of `mapspamc` crops

mapspamc identifies 40 different crop (and crop groups) that together cover the full agricultural sector and are each identified by a four letter code (Table 1). The main reason for this classification is the limited availability of crop-specific biophysical suitability maps, which form a key input of the crop allocation process (see pre-processing spatial data for more information). It would be relatively easy to add new crops by splitting them off from broader crop groups (e.g. separate tomatoes from vegetables) if appropriate agricultural statistics and suitability maps are available. We plan to add an example on how to do this in future updates. The actual number of crops in the model is determined by the number of crops for which statistical information is provided.

library(readxl)
library(kableExtra)
spamdif <- read_excel("tables_chart/tables.xlsx", sheet = "tab_crop_list")

spamdif %>%
  dplyr::select(number, name, group) %>%
  kable(., booktabs = TRUE, linesep = "", caption = "List of `mapspamc` crops")