library(DGVMTools, quietly = TRUE, warn.conflicts = FALSE)
#library(raster, quietly = TRUE, warn.conflicts = FALSE)
#library(data.table, quietly = TRUE, warn.conflicts = FALSE)
#library(plyr, quietly = TRUE, warn.conflicts = FALSE)
#library(viridis, quietly = TRUE, warn.conflicts = FALSE)
#library(Cairo, quietly = TRUE, warn.conflicts = FALSE)

Introduction

This vignette describes how to manipulate and create Format objects in DGVMTools. A Format is an S4-class (like all objects in DGVMTools), and it has 5 slots (the slots are described below). Format objects are pretty important because the contain a lot of metadata about a model output or data set files, so without it DGVMTools won't be able to read the data from disk. One simple modification to a Format object is to define a ne Layer. This is a fairly common for a DGVM development task, so this will be described seperately in a short, first section.

Writing code to describe a new Format object is essential for reading a new DGVM or other data source. So the second (and much longer) part of this vignette describes how to do that. It is also a reference for making more complicated modifications to an existing Format (for example reading new model variables)

Note: It doesn't really make sense to write a new Format for one dataset (that is probably overkill and it would probably rather make sense to bash the data into something compatible with the pre-supplied a NetCDF Format, rather than define a new Format), but it might make sense for many datatsets with a standard file format, for example MODIS or Fluxnet.

Simple case: Adding a new Layer to a Format object

A Format is an S4-class, and like all S4 classes the slots are accessed using the @ symbol. So, for example, to access the id slot of the pre-defined GUESS format:

GUESS@id

The @defined.layers slot is just a list of DGVMTools::Layer objects.

original.Layers <- GUESS@predefined.layers

str(original.Layers)

To add a new PFT we can simple take our copy of the list, add a new PFT, and copy it back into the Format object.

new.Layer =  new("Layer",
                 id = "NewTree",
                 name = "Some New Tree PFT Layer",
                 colour = "red",
                 properties = list(type = "PFT",
                      growth.form = "Tree",
                      leaf.form = "Broadleaved",
                      phenology = "Summergreen",
                      climate.zone = "Temperate")
)

new.Layer.list <- append(original.Layers, new.Layer)

print(str(new.Layer.list))

GUESS@predefined.layers <- new.Layer.list

And done!

Note that this change will only remain active in this particular R session (it doesn't modify the Format in the package), so you will need to do it once at the top of your analysis script.

Writing a new Format (or more involved modifications)

Writing a completely new Format is obviously more involved, and in this case details will depends very much on the way the data files are structered. If you are making a new Format to read ASCII data then the Format-GUESS.R is a good reference. If you are writing a format for netCDF data, use Format-NetCDF.R for reference.

This section of the vignette also gives some idea for how to do 'intermediate level' modifications to a Format object, for example added a new Quantity.

The function of each of the 6 slots in a Format is discussed below.

1. The id slot

This is very simple, just a character string to uniquely identify this format.

id_NewFormat <- "Example_Format"

2. The defined.layers slot

This is a list of DGVMTools::Layer objects. If the Format is for a DGVM, these Layers should correspond to the standard default PFTs, soil C or N pools or fluxes etc. in the DGVM. It is possible that if the Format is for a data set, the concept of Layers might not make sense so this can just be an empty list.

Layers_NewFormat <- list(

  # A couple of tree PFTs

  TeBE = new("Layer",
             id = "TeBE",
             name = "Temperate Broadleaved Evergreen Tree",
             colour = "darkgreen",
             properties = list(type = "PFT",
                  growth.form = "Tree",
                  leaf.form = "Broadleaved",
                  phenology = "Evergreen",
                  climate.zone = "Temperate")
  ),


  TeBS = new("Layer",
             id = "TeBS",
             name = "Temperate Broadleaved Summergreen Tree",
             colour = "darkolivegreen3",
             properties = list(type = "PFT",
                  growth.form = "Tree",
                  leaf.form = "Broadleaved",
                  phenology = "Summergreen",
                  climate.zone = "Temperate")
  ),


  # And a couple of grasses

  C3G = new("Layer",
            id = "C3G",
            name = "Boreal/Temperate Grass",
            colour = "lightgoldenrod1",
            properties = list(type = "PFT",
                 growth.form = "Grass",
                 leaf.form = "Broadleaved",
                 phenology = "GrassPhenology",
                 climate.zone = "NA")
  ),

  C4G = new("Layer",
            id = "C4G",
            name = "Tropical Grass",
            colour = "sienna2",
            properties = list(type = "PFT",
                 growth.form = "Grass",
                 leaf.form = "Broadleaved",
                 phenology = "GrassPhenology",
                 climate.zone = "NA")
  )

)

# Now take a look at them
for(Layer in Layers_NewFormat) {
  print(Layer)
}

3. The quantities slot

This is a list of DGVMTools::Quantity objects. These correspond to the variables that might be in the dataset/model output that one might want to read. For example LAI, biomass, evapotranspiation, GPP/NPP, soil water content. Note that they don't need to be there in every run, (and you don't need to define all variables straight away), but if you want to use them in DGVMTools you need to define them.

quantities_NewFormat <- list(

  # Just a couple of example quantities

  new("Quantity",
      id = "LAI",
      name = "LAI",
      units = "m^2/m^2",
      colours = function(n) rev(viridis::viridis(n)),
      format = c("Example_Format"),
      standard_name = "leaf_area_index"),

  new("Quantity",
      id = "Veg_C",
      name = "Vegetation Carbon Mass",
      units = "kgC/m^2",
      colours = viridis::viridis,
      format = c("Example_Format"))
)

# Now take a look at them
for(quant in quantities_NewFormat ) {
  print(quant)
}

4. The determineQuantities slot

This is a function which will look at a some output files on a disk (normally output from a DGVM run) and return a list of Quantities available to read. The first argument should be the Source object. The second argument "names", is logical and if TRUE, the function should return a list of the ids of the quantities, otherwise it returns the list of DGVMTools::Quantity objects. Additionally arguments for Format-specific details can also be added.

availableQuantities_NewFormat <- function(x, names = TRUE, additional.args){ 

  # typical stuff 
  # * get a list of files in the directory
  # * scan the files for different variables
  # * build a list of qunatities present in the model output / dataset

  # dummy code
  Quantities.present <- list()
  return(Quantities.present)

} 

5. The getField slot

This is the biggie! This function (which will be long, but can of course make use of other functions) is responsible for reading data from disk and bashing it into a data.table; for some cropping to the required spatial-temporal-annual extent (optionally) and for storing the spatial-temporal-annual extent in an STAInfo object. As arguments it takes:

It should return a DGVMTools::Field object. For more details of the Field class see the classes.R file and the main package documentatation ("Field-class)".

Note that the input "sta.info" argument and the sta.info slot of the returned Field don't need to match, the rest of the the cropping/aggregating is done automatically by the package, but it maybe more efficient to read only a subset of the data from disk (eg. for netCDF files where you can easily cut out exactly the 'hyperslab' of data that you need), rather than read the whole file and then crop it.

getField_NewFormat <- function(source, quant, sta.info, verbose, ...){ 

  # open the file and read the data with ncvar_get() or read.table() or whatever

  # code needs to get the data as a data.table so reformat it
  # data.tables should be melted down to one column for Lat, Lon, Year, Month and Day (as appropriate).
  # days run from 1-365, and if data is daily no Month column should be used

  # dummy code
  dt <- data.table()


  # also an STAInfo object
  # define this properly based on the actual data in the data.table
  return.sta.info <- new("STAInfo",
                         first.year = 1901, # let's say
                         last.year = 2018, # let's say
                         year.aggregate.method = "none", # see NOTE 1 below
                         spatial.extent = extent(dt), # raster::extent function has been define for data.tables, might be useful
                         spatial.extent.id = "Full", # see NOTE 2 below
                         spatial.aggregate.method = "none", # see NOTE 1 below
                         subannual.resolution = "monthly", # let's say
                         subannual.aggregate.method = "none", # see NOTE 1 below
                         subannual.original = "monthly" # let's say
  )

  # NOTE 1: normally there is no reason to do any aggregation in this function since it will be done later (and doesn't save any disk reading)
  # NOTE 2: if no spatial cropping at this stage set this to be the character string "Full", if spatial cropping was done, set it to the spatail.extent.id argument of the target sta.info object


  # make the Field ID based on the STAInfo and 
  field.id <- makeFieldID(source = source, var.string = quant@id, sta.info = sta.info)

  # build a new Field
  return.Field <- new("Field",
                      id = field.id,
                      quant = quant,
                      data = dt,
                      sta.info = return.sta.info,
                      source = source)


  return(return.field)

} 

Combining

Finally we can combine the six parts that we just made to make the new Format object,

NewFormat <- new("Format", 
                 id = id_NewFormat,
                 predefined.layers = Layers_NewFormat, 
                 quantities = quantities_NewFormat, 
                 availableQuantities = availableQuantities_NewFormat, 
                 getField =getField_NewFormat)

And there we go. A new Format object. Obviously the trickness is defining the functions, particularly getField()

{r Final print, echo=TRUE} print(NewFormat)



MagicForrest/DGVMTools documentation built on Aug. 23, 2024, 8:05 a.m.