In ldemaz/rcropmod: Crop model simulations in R

knitr::opts_chunk$set(echo = TRUE, message = FALSE, collapse = T, 
                      comment = "#>", fig.width = 5, fig.height = 5, 
                      fig.align = "center")
library(rcropmod)

Overview

A demonstration of how to set up and run DSSAT using rcropmod functions. DSSAT CSM must be compiled and able to be run from command line on the computer you are using.

Set-up

Prepare necessary files

rcropmod installs with a subset of soil data from HarvestChoice's WISE v.1. profile data, corresponding to Zambia. The reference grid (stored in csv format), raster map of cell identifiers, and .SOL file are accessible as system.files. We also have included a maize cultivar file that was bundled with the pSims gridded crop modeling framework, with some coefficients adjusted for simulations performed in recent work (Estes et al., 2016).

We'll access these now to run this demonstration.

First, let's replace DSSAT's cultivar file with ours. This of course assumes that you already have a functioning DSSAT CSM install. For this example, we have DSSAT45 installed in ~/DSSAT45. We want to replace the existing MZCER045.CUL file in the GENOTYPE folder.

p_geno <- "~/DSSAT45/GENOTYPE/"  # path to genotype folder

# back-up existing file
file.rename(fp(p_geno, "MZCER045.CUL"), fp(p_geno, "MZCER045.BAK"))

# copy new one over
file.copy(system.file("extdata", "MZCER045.CUL", package = "rcropmod"),
          fp(p_geno, "MZCER045.CUL"))

Now let's do the same with the Zambia soil data. The .SOL installs as a zipped file, so we will have to copy it unzip it to DSSAT's SOIL directory.

p_soil <- "~/DSSAT45/SOIL/"
file.copy(system.file("extdata", "WI.SOL.zip", package = "rcropmod"),
          fp(p_soil, "WI.SOL.zip"))
unzip(fp(p_soil, "WI.SOL.zip"), exdir = p_soil) # unzip
file.remove(fp(p_soil, "WI.SOL.zip"))  # remove zip file

We are need to use the soil reference raster and table that accompanies the profiles, which will allow us to figure out which soil profiles correspond to the location of our weather dataset, the construction of which is demonstrated in the DSSAT weather file vignette.

# read soil raster and csv into data.table
zmsolref <- fread(system.file("extdata", "zamsoils.csv", package = "rcropmod"))
zmsolgrid <- raster(system.file("extdata", "zmsoilgrid.grd", 
                                package = "rcropmod"))

We now have a soil dataset that we can query, and a weather file (TEST7932.WTH) will be available if you run the DSSAT weather file vignette.

Set up experiment file

We are going to use the weather and soil files, along with some management parameters, to set up an input table that will be used to drive simulations.

First, we are going to select the soils data we need. We are interestd in a location in the southern Province of Zambia. We are going to use the coordinate of the weather file TEST7932.WTH to extract the names of the soil profiles that we need. Since that file was built from the data in the weathdat file, we can simply find those coordinates from that dataset.

xy <- cbind("x" = unname(weathdat$xyz["lon"]), 
            "y" = unname(weathdat$xyz["lat"]))
pt <- cbind.data.frame("x" = xy[, "x"], "y" = xy[, "y"]) 
coordinates(pt) <- ~x + y  # convert to spatialPoints

# Identify which soil grid cell grid the point intersects
soilid <- extract(zmsolgrid, pt)  # intersects point with raster of cell IDs 
profs <- zmsolref[CELL5M == soilid]  # select corresponding profile names

This shows us that there are three profiles within the 10 km (0.08333 degree) grid cell intersected by the weather file location. The SharePct column gives the amount of the cell covered by that soil type.

We want to next extract the horizon depths and horizon and water holding capacities of these soils, which we will use in initializing DSSAT model runs. This is provided by the read_sol_hor function, which reaches into WI.SOL to extract the variables of interest for these particular profiles.

profdat <- read_sol_hor(solfile = "~/DSSAT45/SOIL/WI.SOL",
                        profiles = profs[, SoilProfile])
profdat$prof  # the first set of profile data

Now we'll set up an input table that will specify parameters we want to pass to DSSAT. The first step is to join the input field file with the weather station (TEST7932) ID, to define the "field", or unique climate-soil combination for the site. In this case there are three, because the grid cell in question has three soil profiles associated with it.

A number of additional variables are made, which are described in the comments, several with built-in utility functions. Among these are the main experimental control variables, which include:

crop type (CR)
planting density (PPOP): how many plants per m^2
planting date (PDATE): Date of planting. We enter this here as a fixed variable to allow some other date parameters to initialize, both we are go add in others in this case also.
initial conditions and starting date (ICDAT, SDATE): Dates at which soil initial conditions are read, and when simulation starts, respectively. We want these to be slightly before the first planting date in this simulation.
row spacing (PLRS): in cm
N fertilizer (FAMN): in kg/ha, 5 in this example (representing smallholder practices)
Initial soil moisture (H20_s): not a DSSAT parameter, but used here with profile depth and water holding parameters to calculate how much water is in the soil upon initialization.
Number of years/seasons to simulate (NYERS): 31 in this example

This list above, which is only a partial one, are for variables that will remain fixed in this example. The following variables are ones we are interested in changing, which will add complexity to our treatments.

Cultivar (INGENO): The crop cultivar, referenced by a numeric code. In this example we will use two cultivars: a generic short season hybrid, and a generic medium season hybrid.
Planting date (PDATE): When the crop is sown. Here we will examine four different planting dates, separated by two-week intervals, beginning on November 1.

# combine WTH name, coordinates, soil grid id, and profile data
field <- cbind("WTH" = "TEST7932", xy, soilid, profdat)

# Create unique field ID (FID) and name for output X file that drives DSSAT
# experiments
field[, c("ID_FIELD", "XNAME") := list(fid(field$WTH), xname(field$WTH, .N))] 

# Fixed parameters
field$CR <- "MZ"  # crop to plant (maize)
field$PPOP <- 3.7  # maize planting density
field$PDATE <- 79305 # planting dates (have to provide)
field$SDATE <- field$ICDAT <- 79298 # initial condition and starting date 
field$PLRS <- 90   # row spacing
field$FAMN <- 5  # per Thornton et al (2009)
field$H20_s <- 0.05  # initial soil moisture content of soil
field$NYERS <- 31  # number of years to run each simulation

# create table with linking fixed field parameters to weather and soil data
xtab <- x_tab(fields = field)  

# Now varying treatment parameters
cult <- c("HY0001", "HY0006")  # cultivar names
CLNUM <- 1:length(cult)  # number for each cultivar
pdates <- strftime(seq.Date(as.Date("1979-11-01"), as.Date("1979-12-31"), 14),
                   "%y%j")  # planting date vector, converted by YYDOY

# combine treatments, use t_tab function to assign additional X file parameters
tcomb <- expand.grid(list("PDATE" = pdates, "INGENO" = cult), 
                     stringsAsFactors = FALSE)  
ttab <- cbind("N" = 1:nrow(tcomb), tcomb, 
              t_tab(tvars = c("PDATE", "INGENO"), topts = c("MP", "CU"), 
                    ttab = tcomb))  # use t_tab func

# Now join the variable treatment table to the fixed table, splitting fields 
# into a list
xtabl <- lapply(1:nrow(xtab), function(x) { # x <- 1
  print(x)
  d <- xtab[x, ]  # split field off (1 row == 1 field)
  d2 <- do.call(rbind, lapply(1:nrow(ttab), function(x) d))  # expand rows
  upd_col <- colnames(d2)[which(colnames(d2) %in% colnames(ttab))]  
  d2[, c(upd_col) := ttab[, upd_col]]  # update columns having variable values
  xt <- cbind(data.table(ttab), d2[, !colnames(d2) %in% upd_col, with = FALSE])
})

We now have a list of data.tables specifying necessary inputs for running DSSAT.

Run DSSAT

Having the input parameters in a list (each list element representing a different field), we are now going to set up a run of DSSAT, which has three four stages, each of which is one function run within an lapply function:

Use x_file to create an X file (here an .MZX file that lives in the Maize sub-folder of DSSAT);
Use batch_file to write a batch file that CSM uses to find the correct X files;
Execute CSM using exec_csm;
Collect results from each run using read_csm_outfile.

The help files will provide more details on the arguments passed to each function.

xrun <- lapply(xtabl, function(x) {  # x <- xtabl[[2]]
  print(x$XNAME[1])
  xf <- copy(x)
  xfnm <- x_file(xtab = xf, outdir = "~/DSSAT45/Maize", z = "01", 
                 xtype = ".MZX")
  bname <- batch_file(xl = list(xf), xfiles = xfnm, 
                    outdir = "~/DSSAT45/Maize", btype = "MAIZE")
  exec_csm(projdir = getwd(), csmdir = "~/DSSAT45/", rundir = "~/DSSAT45/Maize",
           bname = bname)
  sdat <- read_csm_outfile(rundir = "~/DSSAT45/Maize", type = "summary", 
                           vars = c("RUNNO", "TRNO", "FNAM", "SOIL_ID...", 
                                    "PDAT", "MDAT", "HWAH", "PRCM"))
})

# save results as system file because can't run CSM in normal vignette building
# process
setwd(p_wd)
save(xrun, file = "inst/extdata/xrun.rda")

# load system file results from above, for vignette building
load(system.file("extdata", "xrun.rda", package = "rcropmod"))

Each of the four functions running within the lapply above can be broken out separately, but the structure is useful for condensing the process. It also establishes a framework for parallelizing the simulations, as the lapply can be replaced with something like doMC::foreach(), which is a functionality that will be added at a later stage.

Examine Results

Now that we have results, we can have a look at them. We'll join the list into a single data.table containing all three fields. We are going to pick up the soil percentages for weighting results. We also have to be able to analyze by treatment type. There are 10 treatments per field (2 cultivars X 5 planting dates), and these are replicated for each of the 31 years of analysis and 3 soil types, so 10 * 31 * 3 = 930 yield estimates. We want to analyze those in a variety of manners, but are most interested in seeing how yield varies as a function of planting date and cultivar, so we want to average over soils.

This requires a little bit of extra preparation first. The most difficult part is setting up and index to identify the different cultivar types. Treatment order in our setup goes cultivar first, followed by planting date, so cultivar 1 was tested against all fives planting dates, followed by cultivar 2.

sdat <- rbindlist(xrun)
setnames(sdat, names(sdat), gsub("\\_ID|NO|\\.", "", names(sdat)))

# merge with soil profiles 
sdat <- merge(sdat, profs[, -1, with = FALSE], by.x = "SOIL", 
              by.y = "SoilProfile")

# set up indices for identifying where cultivar treatments start and end
ind <- seq(1, 10, 5)  
cult <- rep(0, nrow(sdat))
for(i in 1:2) cult[which(sdat$TR %in% ind[i]:(ind[i] + 4))] <- i
sdat[, CU := cult]
setcolorder(sdat, c("RUN", "TR", "CU", "FNAM", "SOIL", "PDAT", "MDAT",
                    "HWAH", "PRCM", "SharePct"))  # reorder, for tidiness

This setup now allows us to weighted averages by soils, and analyze by treatment type. The following summarizes results by taking the weighted average across soils for each planting date/cultivar combination, preserving the inter-annual variability.

sdat_red <- sdat[, list("HWAH" = weighted.mean(HWAH, w = SharePct)), 
                 by = list(PDAT, CU)]
sdat_red[, PDY := substr(PDAT, 5, 7)]  # get planting dates with out year
sdat_red[, YR := substr(PDAT, 1, 4)]  # get planting dates with out year

Now you can look at different combinations within this. First let's take the annual mean yield by cultivar and planting date, which we'll plot

cols <- c("red", "blue")
yrng <- round(range(sdat_red$HWAH) / 1000) * 1000
plot(1:5, 1:5, ylim = yrng, pch = "", xlab = "PDATE", ylab = "HWAH",
     xaxs = "i", yaxs = "i", las = 2, xaxt = "n")
axis(1, at = 1:5, labels = unique(sdat_red[, PDY]), las = 2)
polygon(x = c(0, 0, 5, 5, 0), y = c(0, yrng[2], yrng[2], 0, 0), col = "grey")
for(i in 1:length(ind)) {
  lines(sdat_red[, mean(HWAH), by = list(PDY, CU)][CU == i][, V1], 
        col = cols[i])
}

Note that the medium season hybrid (blue) has higher yields overall than the the short-season (red), and that yields increase with later planting dates.

Next is a plot of mean annual results of cultivars by year (i.e. averaged across planting dates.

plot(1:31, 1:31, ylim = yrng, pch = "", xlab = "YEAR", ylab = "HWAH", 
     xaxs = "i", yaxs = "i", las = 2, xaxt = "n")
axis(1, at = 1:31, labels = unique(sdat_red[, YR]), las = 2)
polygon(x = c(0, 0, 31, 31, 0), y = c(0, yrng[2], yrng[2], 0, 0), col = "grey")
for(i in 1:length(ind)) {
  lines(sdat_red[, mean(HWAH), by = list(YR, CU)][CU == i][, V1], 
        col = cols[i])
}

Here's yield for each of the 5 planting dates, averaged across cultivars, for each of the 31 years in the simulation. Red is earliest, blue is latest, orange to light green intermediate between those.

cols <- RColorBrewer::brewer.pal(n = 5, name = "Spectral")
plot(1:31, 1:31, ylim = yrng, pch = "", xlab = "YEAR", ylab = "HWAH", 
     xaxs = "i", yaxs = "i", las = 2, xaxt = "n")
axis(1, at = 1:31, labels = unique(sdat_red[, YR]), las = 2)
polygon(x = c(0, 0, 31, 31, 0), y = c(0, yrng[2], yrng[2], 0, 0), col = "grey")
pdy <- unique(sdat_red$PDY)
for(i in 1:length(pdy)) {
  sdat_red[, mean(HWAH), by = list(PDY, YR)][PDY == pdy[i]][, 
           lines(V1, col = cols[i])] 
}