In FoRTExperiment/ed4forte: FoRTE-specific utilities for working with ED2

Prepare inputs

For this tutorial, in your working directory, create a folder called ed-input-data.

Download meteorology inputs from the GitHub releases page. For this tutorial, use the NARR-ED2.tar.gz file (North American Regional Reanalysis). Download the file into ed-input-data and extract it.

cd ed-input-data
tar -xf NARR-ED2.tar.gz
cd ..

You should now have a directory ed-input-data/NARR-ED2 containing a bunch of files like 1979JAN.h5 as well as a file called ED_MET_DRIVER_HEADER.

Open the ed-input-data/NARR-ED2/ED_MET_DRIVER_HEADER file in a text editor, and change the third line (/Users/shik544/...) so it matches the path of the directory (e.g. /path/to/current/directory/ed-input-data/NARR-ED2/). You can also do this with the following shell command:

# Use `gsed` instead of `sed` if on MacOS
sed -i "3s:.*:$PWD/ed-input-data/NARR-ED2/:" ed-input-data/NARR-ED2/ED_MET_DRIVER_DATA

The remaining inputs required for a basic ED2 simulation ship with the ed4forte package. Install the package from your local clone (devtools::install("/path/to/ed4forte")) or from GitHub (devtools::install_github("FoRTExperiment/ed4forte")).

Basic ED2 run

First, locate the ED2 executable and set the R option ed4forte.ed2_exe to the absolute path to this executable.

options(ed4forte.ed2_exe = "/path/to/ED/build/ed_2.2-dbg")

options(ed4forte.ed2_exe = "/Users/shik544/Projects/edmodel/ed2-my-develop/ED/build/ed_2.2-opt",
        width = 80)

Use the following R code to perform a simple ED2 simulation at UMBS for the year 2000 starting from bare ground:

ed_input_dir <- here::here("unsynced-data", "ed-input-data")
stopifnot(file.exists(file.path(ed_input_dir, "NARR-ED2", "ED_MET_DRIVER_HEADER")))

library(ed4forte)
outdir <- "test-ed-outputs"
if (length(list.files(outdir, "analysis-E")) == 12) {
  # Just sleep for a few seconds, so we have a process object to inspect
  p <- processx::process$new("sleep", "3")
} else {
  # Actually run ED
  p <- run_ed2(
    outdir,
    "2000-01-01",
    "2001-01-01",
    ED_MET_DRIVER_DB = file.path(ed_input_dir, "NARR-ED2", "ED_MET_DRIVER_HEADER")
  )
}

library(ed4forte)

outdir <- "test-ed-outputs"
ed_input_dir <- "ed-input-data"

p <- run_ed2(
  outdir,
  "2000-01-01",
  "2001-01-01",
  ED_MET_DRIVER_DB = file.path(ed_input_dir, "NARR-ED2", "ED_MET_DRIVER_HEADER")
)

The first argument to run_ed2 is the output directory (which will be created if it doesn't exist). The next two arguments are the start and end date (-times), respectively. The last argument indicates that the default ED2IN value of the ED_MET_DRIVER_DB tag should be modified to that value. Any default values for ED2IN tags can be modified in this way.

Note that the code above will return immediately. This is because it triggers ED2 to run in the background. The return object (p) is a processx process, which can be examined.

p$get_status()
p$is_alive()

For more details, see the processx::process documentation. To wait for the run to complete, use the wait() method.

p$wait()

You can tell that the run finished by examining the contents of the output directory.

list.files(outdir)

You should see the 12 monthly output files, along with a copy of the ED2IN file used for the run and two log files corresponding to the input (stdout) and error (stderr) streams.

Reading output

ED2 outputs are in HDF5 format, of which NetCDF is a special case. Therefore, they can be read by NetCDF utilities, like the ncdf4 package.

outfiles <- list.files(outdir, "analysis-E-", full.names = TRUE)
nc <- ncdf4::nc_open(outfiles[1])
ncdf4::ncvar_get(nc, "AGB_CO")
ncdf4::ncvar_get(nc, "MMEAN_GPP_PY")

Unfortunately, ED2's produces only one file per timestep, which makes reading in results from long simulations pain. ed4forte provides some utilities for reading a subset of ED2 file types and variables in bulk.

results <- read_monthly_dir(outdir)
results

scalar_results <- tidyr::unnest(results, df_scalar)
scalar_results

plot(MMEAN_GPP_PY ~ datetime, data = scalar_results, type = "o",
     main = "Total plot GPP")

pft_results <- tidyr::unnest(results, df_pft)
pft_results

plot(MMEAN_LAI_PY ~ datetime, data = pft_results, col = pft, pch = 19,
     main = "LAI by PFT")

Configuring ED2 runs

ED2 runs are configured using two files: ED2IN is a Fortran namelist file generally used for settings that affect the entire run (e.g. start and end dates; integration timestep; submodel configurations). A reference ED2IN with documentation and default values (lightly tweaked for FoRTE) is in the inst/ED2IN file. config.xml is an XML file that contains PFT-specific parameters, plus a few others. A complete example featuring all of the ED2 default values can be found in the inst/ed2-default-parameters.xml file. Confusingly, some parameters can be set from both the config.xml and the ED2IN -- where both are used, parameters in the config.xml override the ED2IN values.

Both sets of parameters can be modified through the run_ed2 function. To modify ED2IN values, pass the name of the tag (without the leading NL%) directly as an argument to run_ed2. For instance, in the below example, we modify the ED_MET_DRIVER_HEADER and CROWN_MOD tags.

run_ed2(
  outdir, start_dt, end_dt,
  ED_MET_DRIVER_HEADER = "/path/to/ED_MET_DRIVER_HEADER",
  CROWN_MOD = 1
)

To see the default values of the ED2IN file, and for some documentation about what they mean,

The parameter configuration XML file (typically called config.xml) can be configured via the configxml argument, which can be either a path to a complete config.xml file or (probably easier) a list of values. For example:

run_ed2(
  outdir, start_dt, end_dt,
  configxml = list(
    pft = list(num = 9, SLA = 35),
    pft = list(num = 10, Vm0 = 27.4),
    phenology = list(theta_crit = -1.25)
  ),
  ED_MET_DRIVER_HEADER = "/path/to/ED_MET_DRIVER_HEADER"
)

Note the format: A nested list of lists, with the name of each first-level list indicating the type of parameter and each sub-list consisting of name-value pairs. In the above example, we are setting the SLA of PFT number 9 (Temperate Early-successional Hardwood) to 35 and the Vcmax of PFT 10 (Temperate Mid-successional Hardwood) to 27.4, as well as setting the PFT-independent phenology submodel parameter theta_crit to -1.25.

Because the most common use of the config.xml is setting PFT parameters, which may be well-suited to a tabular data structure, you can also pass parameters as a data.frame (but only if you're only setting PFT parameters --- if you have a mix of PFT parameters and other values, you need to give the explicit nested list. Also, for this to work, the same set of PFT parameters has to be set for all PFTs.):

run_ed2(
  outdir, start_dt, end_dt,
  configxml = data.frame(
    num = c(9, 10),
    SLA = c(35.3, 38.2),
    Vm0 = c(27.4, 24.3)
  ),
  ED_MET_DRIVER_HEADER = "/path/to/ED_MET_DRIVER_HEADER"
)

If ED2 finds and reads a config.xml file, it will also write out a file called history.xml in the same directory that contains a complete list of all parameters and values used. This is useful for figuring out what ED2's default parameters are, and confirming that parameters from config.xml have been read in correctly. (The history.xml file from a test run is actually the basis for the ed2-default-parameters.xml file that ships with this package. Also, for a tabular version of the default PFT parameters, see the inst/ed2-default-pft-parameters.csv file).

NOTE: A key difference between the ED2IN and config.xml files is how strictly they are parsed and how values unknown to ED2 are treated. The ED2IN file must be complete and have no values that ED2 does not know about -- both missing entries and entries that ED2 doesn't know about will result in errors during run initialization. On the other hand, unknown values in config.xml are silently ignored -- i.e. ED2 will only read in parameters that it knows about. This is why it's a good idea to check the history.xml file to make sure that your parameters from the config.xml are being correctly set.