In michielvandijk/mapspamc: An R package to create crop distribution maps

Error

There will be errors when there are no irrigated crop as in this case ir_crop_gdx in combine_inputs will result in NA values that are not allowed in the GAMS file.

Error

harmonize_inputs - split_harmonized_inputs - select_grid_cells - rank_cl In case for a certain adm2, pa is very low - even lower than the available cropland in the first grid, there will be an error because no grid cell is selected, while the first grid cell should be selected. An extra rule is needed that adds a at least the grid cell with the highest rank.

Error

When there is no irrigation in a country (or adm1 if model_solve = 1), crop_area_gdx and/or crop_ir_gdx will be empty in combine_inputs_adm_level, resulting in an error => added if else statements in combine_inputs_adm_level. However GAMS needs positive values for ir_crop so this gives an error, need to update the GAMS code for these cases.

make ffunction visible

run_gams_adm_level() can be very useful to run a single adm1 model for large countries => rename into run_model_adm_level, document and export so it can be accessed as function.

Package protection

Do not allow people to change master

gdxrrd

package now has website on github

Artificial adm calculations

need to do a check if data is inconsistent when calculating artifiical adms.
EGY old file is an example, eg. EG24, where total maize ha in adm2 > adm1 total, while two adm2 have NAs...
Apparently I already checked for very small numbers. Perhaps issue warning if large negative numbers >2 are found?
Switched removing adms with 0 statistics when computing art_adm as this removed full adm when all statistics are zero. However, I added a note that having zeros might result in an error when calculating artificial adms. Need to investigate.

all cropping systems irrigated gives warnings as some priors assume values.

Add if statement to not calculate priors/scores when system is not present

Loading files, also when gdalwarp is called

Always need to add a file.exists statement and stop when it does not!

Import of packages

Only ggplot now but need to check what is really necessary

Doi badge to paper and sticker

https://github.com/GuangchuangYu/badger https://github.com/GuangchuangYu/hexSticker

create spam folders

creates two folders in raw_path, adm and subnational statistics, do we want this?

check if input is consistently zero (eg when there are no subsistence systems, rps is zero) in combine_inputs script.

If data is zero, it cannot be written to a GAMS file. Added rough code to filter out set and data related to subsistence but this check should be done for all. Most likely GAMS will not be able to read the file in case of fitness model as required data (e.g when subsistence is fully missing) is not present.

Joe used adm_level = 1 but with adm2 data, resulting duplicated grid ID in adm_map_r because of joing on adm1 instead of 2. We can add a warning that if adm_level2 is selected, adm2 data cannot be present in the map when rasterizing!

Small adms

Compare polygon size with statistics and single out those where area > polygon size which seems unrealistic (apart from multi harvests)
Create function that compares ADMs in adm_r with ADMs in statistics and signals when some of them drop out because they are too small.

Error messages

create_statistics_template("ha", param) gives error if adm_list does not exist => need to improve message.

add check that compares adm_level = 2 with existence of e.g. adm2 level data in ha_stat . If this is not the case, the create_artificial_adm gives and error.
remove res from names as this is only one parameter of the model
load_data and r functions include _tp1 instructions => need to be removed

KEY!! Overlay of grid and acc shows that the extent is the same BUT actual selected cells are different on the edges (5 arcmin grid)!!! This means that allign rasters in combination with cutline does not select the same cells> Probably easiest solution is to use gdal to create the raster....perhaps create (1) create raster of ALL cellS touched by polygon and then run gdal...

need to illustrate the case when no subnational statistics are available.
convert gather to pivot code and check if tidyr needs to be imported fully => function is better.
create function to wipe all model results
ensure that maps are deleted before the are created again. GDAL does not seem to overwrite!
note somewhere that all GAEZ maps are created to have substitute crops when data is NA throughout.
write all imported functions as dplyr::

Finish documentation and functions

load_data and add functions that load the data instead of repeating code.
make create_adm_template more general by looping over adm levels
Create check file for each file that is loaded (e.g. suitability raster file), and give warning if file is not there.
view_stack does not work for GMB due to an extend problem, which for some reason is not fully harmonized.
NB Create irrigation map: extend should be the same. See GMB for fix
NB lines 100-116 calibrate statistics.r refers to fips and adm, which should be adm_code and adm_name. See GMB for fix
NB When score is created and no substitute maps are available an error is produced. Need to make the error explicit by sending the message that a subsiture map is not specified (see oilp GMB for example)
NB need to add comment that adm_code nees to be a character string as otherwise bind_rows with adm0 (iso code) will not work! Or develope code that adm_code is always set as character when read in from the csv file. This might be cumbersome as sometimes we have adm1 and sometimes adm2 also...
add align_raster function based on gdalUtilities package. Check if GDAL is still needed then...
IF there are still NAs at the top level, when running at sol_level1, there will be an error in harmonize_inputs(param) as the cl_df file will be empty. Need to add a stop with message if this occurs. Also add this as a check in the check_ha_pa function that I need to write/improve.
Check why there is an indication of ir slack when harmonizing at adm1_level in MWI.
Check: with present data adm_art_map has art_adms that are approximately zero. This should not happen if the rebalancing would be succesful. Added code to remove them in combine model input to remove them!
check where to store all the mappings files. On the one hand we should store the in the package on the other hand we should provide a way to update them.
check where AQUASTAT and FAOSTAT version. Probably best is in setup file. Add and if statement to indicate when this is forgotten.
Create mapping of gaez when maps are missing!
clean up crop mapping GAEZ
Need to develop one cliping function using gdal that does the major lifting!. Should be applied to all maps incl gaez. This means we can remove the creation of a 5min grid in grid.r
Remove -9999 values from tt maps!!!
select gaez needs to be cleaned up
Go through spatial data coded and add consistent functions from mapspam_functions.r
Add warnings when user input in set_grid... and 01_process_adm are not set! Check if names and codes are unique?
create flow chart that connects all the scripts - see MAGNET example that Lindsay created once.
check https://github.com/JGCRI/metis/tree/master/R for ideas. Only installs packages if they are not already installed...
Reorganize GAEZ raw data folders. Better to use two folders: suitability and potential_yield. Possible add more crops.
Combine clip and reproject for GAEZ. DO we need the raw files for something.
when package is finished remove source functions and add mapspam in all library calls.
Make sure the model runs when no ADM data is available and only FAOSTAT is used. THis means that 03_process_subnational_statistics.R should be wrapped in an if statement that gives a warning that the file does not need to be created. Also the harmonization script should have an if statement.
Do we want to include the option of scaling to FAOSTAT?
Is it really necessary that an ADM2 model also has ADM1 level data in the shapefile?? CHECK
Need to have a file with full adm mapping that connects adm to lower adms.
Write all pdf reports to one folder "reports", e.g for adm maps, plots to compare stats etc.
use temp_path when saving to temp path as this makes it possible to quickly changes names of folders without changing too much code.
CHeck that there are no duplicate names in adm_name
check if GAEZ uses -999 or something to indicate NA values. If so delete at forehand.
Run system through for ADM1 optimization
check what to do with the check_na in 01_prepare_gaez.r
Add check after creation of each pre-gdx file to see if there are no NAs and no negative values.
Add rounding and rebalancing upwards before creation final pa file.
Check model input script needs to be reworked.
Create script/functions to write grid, year etc to csv file, which then is loaded before scripts are run.
Create function to set folders and (2) function to create all subfolders.. This will automatically create all folders and subfolders, after which all temp folders scripts can be deleted.
check creation of artificial units. case adm_sel = 1 and solve_sel = 1 not covered. Also compare code with code used for selecting pa and pa_fs, which might be applied and is more efficient.
add trycatch when running gams to catch errors
CHECK: SPAM uses additional constraint on artificial adms, for which no slack is allowed => implement!
Add check if there are NAs in score! THis often causes problems.
Related: what will happen if all suit is zero at an ADM1 level when the model is run at ADM1. Need to investigate this.
If 5 arcmin input such as GMIA and GAEZ are first clipped and then reprojected to 30 arcsec, many border cells will be lost because the clip will only include grid cells of which the center is included. Not clear if gdalwarp first clips or first warps. CHECK. For GMAI, we need to change the process, when creating 30arcsec maps, perhap cropping at res+2 grid cells to ensure all grid cells are included.
Decide what to do with all the tables in mappings_spam. Now they are loaded from an EXCEL sheet in the respective functions. Probably better to add to load_data function = introduced from in split_score onwards. Still do replace in replace Gaez function.
Put stricted check on load_data functions as there is no warning when an input is not found.
not smart to use adm_code for folder names of adms as this interfers with selecting of adm_code from files like pa and pa_fs. In due time change all to another variable name. adm_code_sel?
Add some kind of check, which compares the list of adms in the rasterized version with those in statistics. It is possible that a very small adm drops out if it is smaller than the grid cells. In this case things might go wrong. User should merge this with another adm and add statistics.
Add res to name of pa and pa_fs so it is clear when models with different res are run.
perhaps change the name of adm_level in param as this is the same name as a header which might be confusing, adm_depth? or adm_detail?.
https://github.com/wri/MAPSPAM has meta data with fixed colors for each crop!! Can be used to make a nice map that depicts the dominant crop per grid cell!
Also has FAO-spam concordance. Add those tables to the mappings Excel file!
Remove x, y from results file
CHeck allocations. E.g vege_L seems to be allocated to only two grid cells using max_score? Is this the same as on SPAM??
Add checks to all input, eg. Param has to be of class spamc_par, etc.
Check if creation of temporary folders when clipping spatial data can be made more easy. Perhaps create at teh same time as folder structure?
Remove WSG84 adm maps
Need to put model type and solve level in results file and tifs so they can be distinguished!
Add param$crs to gdalwarp of ESA so it is warped to the right projection in case the lc map uses a different crs.
make create_adm_map_pdf general so it can be used for any level of adms,
at the the moment cl and ia harm maps are saved in the adm subfolders. Better not to do this and create them expost from the merged database unless. But check if they are need as intermediate input.
Add option in Github to provide comments.
CHange color of messages! : https://stackoverflow.com/questions/57030452/how-to-change-the-colour-of-the-printed-output-in-base-r
check that replacement crops have to be %in% spam crop list of 40 (or more) loaded from mappings.
Prepare_priors_and scores checks if processed bs and py already exists. As creating priors and scores are combined now, this no longer seems necessary.
Need to add error checks for if somebody changes model_solve and then starts running any of the prep functions or model. This will result in an error because intermediate data is not produced yet.
say somewhere that we take the average of 3 years when scaling to FAOSTAT.
add flowchart from presentation
add figure with folder tree
add table with all sources from paper

Subnational statistics

check_statistics() has in the help that the input should be of class subnat => still need to implement this.
reaggregate statistics has limited documentation
Set all adm names to uppercase in map and in statistics tables

TODO FABLE

COMMENT how country teams can create mapspam2globiom_iso3c github repository
Add coffee as example.
Explain how to add crop to GLOBIOM.
Add github repository
Add link to FABLE-GLOBIOM github.
add CI to crop gdx
Need to make crop2globiom a parameter when creating gdx file so thic an be adjusted if new crop is added in GLOBIOM!
Contact USA team as they are interested in creating Crop maps

To do

update 04_synergy_cropland for now using SASAM product compare_processed statistics compare raw statistics remove cl maps and df in harmonization file when they are redundant to save memory space Create separated functions to calculate priors for each system so they can be easily adjusted. Check this part on using .data$x to make sure data is in the local environment: https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html Check what happens when CHECK: SPAM uses additional constraint on artificial adms, for which no slack is allowed. Not applied at the moment

To do for running over adm.

prep_gaez is prepared for all gridID cells. This will become very large for large countries and at 30 arcsec. It should be (1) split over ADMs. and (2) probably only create it for the relevant gridIDs that for which the landcover data is generated in the harmonization process.

Functions to add

puts wide pa, ci, fs to long
Replace gather by pivot_long
Functions to create stat object

Process raw subnational statistics file.

Code to rebalance ADM totals does not account for cases when, say, all ADM2 units for maize have a value except for one (or more) and the sum of the ADMs where values are not NA are nearly the same as the total of ADM2 but not identical due to rounding. In these cases ADM1 will not be updated as there are missing values. The result is that when the artificial ADMs are constructed the artificial ADM that represents all maize will have a very small value which creates problems in GAMS. Write a function to checks for these cases and presents a warning. Perhaps introduce a threshold. If |sum(ADM2)-ADM1| < 10 Replace and set missing ADM2 to 0. This also means that no artificial adms will be created with pa = 0 as is the case for tea in the present data set.

Wrap rebalancing code in function.

Balancing code is not correct as it does not replace, say ADM1 when there is missing information at ADM2. This is ONLY correct if ADM1 > sum(ADM2). If not there is an inconsistency. In previous code. We also had an extra rule:

Also replace cases when sum(ADM2) > ADM1 and is.na(ADM2) Set is.na(ADM2) to 0.

Also check if it has adm0, adm1 and adm2 (if specified) data. Sometimes adm0 data is not added by mistake, causing errors.

Check if adm0, adm1, adm adm2 are consistent cross ha, ci and fs!

Introduce class subnat stat?!

Perhaps introduce a class subnat_stat, which always consists of subnatstat in wide format.

And has passed a number of checks so it is always consistent.

CHECK that need to be added: When running solve_level = 1, if data is complete at that level, i.e. if data is available for all crops at ADM1 level!

CHeck if crop names are all in SPAM crop list csv file!

Scaling to FAOSTAT.

We do not scale if there are missing values and sum(ADM1)<faostat. Here the same rule should be applied. If the sum is very close to FAOSTAT, missing values should be set to 0.

Compare adm

Build check function that compares adm_map_list, which should be leading, with adm stat

production systems

Add in 02_agricultural statistics a script that creates overview of export shares of crops to determine system. Also add script that creates template for setting shares. Including pre-processing share of irrigated.

Running at ADM0

Need to start at beginning and add code to make this work, for example in Combine statistics. Best is to use MWI as test case and split into three parts, one for each ADM1.

Irrigated area

unclear how to deal with resolution higher than 30arcsec. At the moment gia is the preferred map but if we warp this to 30 arcsec. share of irrigated area will be very low so unclear how ranking will result. Also in case there is not ir area (CL > IR) SPAM uses the rule that CL = IR), which can be implemented at 5min arcsec but probably not at 30 arcsec

Maybe have to use same rule as used for gia to run GMIA as it it might be that warping a 5min map to 5min changes the results.

CHECK GMIA raster is smaller than GIA (although extent etc is the same). It does not seem to include all rasters that are located on the border. Probably is related with the fact that GMIA is already at 5 arcmin. CHECK if this has impact, probably so as this excludes border cells.

Load_inputs funtion

Should read file list from an internal file with names, not hardcode. Perhaps also create function that loads data as there is a lot of repetition.

Similarly there is a need to make the other paths variable. At the moment, load spatial has a hard coded folder, this should become a variable so it is taken from the param. Maybe better to add these to the parameters file

Check script

In MOZ lc appeared to be negative in GDX file, giving error that log <0. Need to create a check file that also checks if all inputs are positive. Add to GAMS code as well like in original scripts

A the moment we are only using replacement crops for same system (e.g. teas_I for coff_I, which both might have 0). Perhaps use coff_H as replacement for coff_I)

Add check if files exist before they are loaded!

Help information in prepare model functions

Functions to fix

NB: run all commands in backward order to see if errors are produced that show input data is missing. Add @title on top, then @decription and then @details (see mapspamc_par for example)

create_folders: add res subfolder to all maps and other processed folders where needed. aggregate_to_adm: ensure that param and alt_param are for the same model apart from adm_level align_raster: validate input format create_all_tif: check if results file exist. create_mapspamc_folders: check if all (too many) folders are created load_data: reprogram and check if data is available harmonize_inputs and split_harmonized_inputs: add much more help info, check all and in particular generation of log. Set maximim to slackp and slackn. Check why slack for ia= max and for cl = min prepara_physical_area: problem with cat which somehow seems to be surpressed. prepare_cropland_tp1": Move to development branch prepare_priors_and_scores: add documentation reaggregate_statistics: add documentation

Chick if all mapping files are relevant and complete

Calculation of priors

In the present case it is possible that a warning shows up that something is divided by zero when calculating the priors/max score: Problem while computing prior = 100 * (prior - min(prior, na.rm = T)) / .... i no non-missing arguments to min; returning Inf.

This is produced by split_priors, the function where the priors are calculated. The calculation of the prior uses the minimum and maximum of the function to create the prior/score ranking. In case all values are zero (because GAEZ does not have values for this crop/region), there is a divide by zero resulting in NAN values, resulting in a warning. The NAN values are replaced by 0 so there is no real error but it does mean that prior/scores are zero for all crops of the system for which the prior/score is calculated, in turn, resulting in a somewhat random distribution, or at least one that does not take the prior/score into account. The error has the most chance of occuring for irrigated crops and with model_solution 1 as it is possible that suitability is 0 for a certain region. Also suitability for irrigated crops is much often zero than for non-irrigated crops. An example when this happens is ooil_I in ETH6. We might need to add a warning for this in the log file so the user is aware. A solution for this would be the possibility to add more replace crops and also allow for example ooil_H to replace ooil_I. Having zero suitability for all grid cells for a crop that needs to be allocated in that area is a problem in any case so best would be the ensure that always suitability > 0 for a crop in a certain area.

Probably it would be better to create separate functions for each prior/score calculation so they can be easily updated.

Checking statistics

In check and calibrate statistics, FAO is compared to subnat stat and if FAO has data for a crop, it is added add the national level. NOTE that now adm1 level data might have become incomplete resulting in errors when running soolution_model = 1. We NEED to add a function that checks if the data is complete at adm1 level (and gives a warning when model is run at ADM1!!

Need to add a function that checks if adm_list from polygon is identical to subnat stat! Again, I had a case in which adm0_name was ET in the polygon and ETH in the statistics resulting in problems.

Also need a function that compares if adm_list is consistent between ci, fs and ha

Naming if files

Names of intermediate files is not completely consistent with model type (misses adm and solve level. Perhaps add to file name in some sort of short form?)

Log file

use log package
combine logs of ADM level solutions in one file as well
process_gaez does not add output to log file for solve_level 1 model 30sec?? Seems to be the case for ETH perhaps because things were rerun and files were allready there!! Maybe switch of this function or create separate log file for GAEZ.

Solve level 1

Before combining different GDX results file, check whether the gdx results file exists. If not give a warning that that ADM model was not properly solved (infeasible) and requires attention.
Export run_gams_adm_level as this is convenient to run a separate ADM when something went wrong.

zero biophysical suitability

Gives problems in max score model as zero bs implies zero rps for that crop. If subsistence crop needs to be allocated this will not work because the Constraint 6 - Subsistence allocation should be similar to rural population share in sample will be infeasible.

michielvandijk/mapspamc documentation built on April 17, 2025, 7:31 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

michielvandijk/mapspamc An R package to create crop distribution maps

In michielvandijk/mapspamc: An R package to create crop distribution maps

Error

Error

Error

make ffunction visible

Package protection

gdxrrd

Artificial adm calculations

all cropping systems irrigated gives warnings as some priors assume values.

Loading files, also when gdalwarp is called

Import of packages

Doi badge to paper and sticker

create spam folders

check if input is consistently zero (eg when there are no subsistence systems, rps is zero) in combine_inputs script.

Small adms

Error messages

Finish documentation and functions

Subnational statistics

TODO FABLE

To do

To do for running over adm.

Functions to add

Process raw subnational statistics file.

Introduce class subnat stat?!

Perhaps introduce a class subnat_stat, which always consists of subnatstat in wide format.

And has passed a number of checks so it is always consistent.

Scaling to FAOSTAT.

Compare adm

production systems

Running at ADM0

Irrigated area

Load_inputs funtion

Check script

A the moment we are only using replacement crops for same system (e.g. teas_I for coff_I, which both might have 0). Perhaps use coff_H as replacement for coff_I)

Add check if files exist before they are loaded!

Help information in prepare model functions

Functions to fix

Chick if all mapping files are relevant and complete

Calculation of priors

Checking statistics

Naming if files

Log file

Solve level 1

zero biophysical suitability

R Package Documentation

Browse R Packages

We want your feedback!

michielvandijk/mapspamc
An R package to create crop distribution maps