library(DGVMTools, quietly = TRUE, warn.conflicts = FALSE) #library(raster, quietly = TRUE, warn.conflicts = FALSE) #library(data.table, quietly = TRUE, warn.conflicts = FALSE) #library(plyr, quietly = TRUE, warn.conflicts = FALSE) #library(viridis, quietly = TRUE, warn.conflicts = FALSE) #library(Cairo, quietly = TRUE, warn.conflicts = FALSE)
This vignette describes how to manipulate and create Format
objects in DGVMTools. A Format
is an S4-class (like all objects in DGVMTools), and it has 5 slots (the slots are described below). Format
objects are pretty important because the contain a lot of metadata about a model output or data set files, so without it DGVMTools won't be able to read the data from disk. One simple modification to a Format
object is to define a ne Layer. This is a fairly common for a DGVM development task, so this will be described seperately in a short, first section.
Writing code to describe a new Format
object is essential for reading a new DGVM or other data source. So the second (and much longer) part of this vignette describes how to do that. It is also a reference for making more complicated modifications to an existing Format
(for example reading new model variables)
Note: It doesn't really make sense to write a new Format for one dataset (that is probably overkill and it would probably rather make sense to bash the data into something compatible with the pre-supplied a NetCDF Format
, rather than define a new Format
), but it might make sense for many datatsets with a standard file format, for example MODIS or Fluxnet.
Format
objectA Format
is an S4-class, and like all S4 classes the slots are accessed using the @ symbol. So, for example, to access the id slot of the pre-defined GUESS
format:
GUESS@id
The @defined.layers slot is just a list of DGVMTools::Layer objects.
original.Layers <- GUESS@predefined.layers str(original.Layers)
To add a new PFT we can simple take our copy of the list, add a new PFT, and copy it back into the Format
object.
new.Layer = new("Layer", id = "NewTree", name = "Some New Tree PFT Layer", colour = "red", properties = list(type = "PFT", growth.form = "Tree", leaf.form = "Broadleaved", phenology = "Summergreen", climate.zone = "Temperate") ) new.Layer.list <- append(original.Layers, new.Layer) print(str(new.Layer.list)) GUESS@predefined.layers <- new.Layer.list
And done!
Note that this change will only remain active in this particular R session (it doesn't modify the Format
in the package), so you will need to do it once at the top of your analysis script.
Format
(or more involved modifications)Writing a completely new Format
is obviously more involved, and in this case details will depends very much on the way the data files are structered. If you are making a new Format
to read ASCII data then the Format-GUESS.R is a good reference. If you are writing a format for netCDF data, use Format-NetCDF.R for reference.
This section of the vignette also gives some idea for how to do 'intermediate level' modifications to a Format
object, for example added a new Quantity
.
The function of each of the 6 slots in a Format
is discussed below.
id
slotThis is very simple, just a character string to uniquely identify this format.
id_NewFormat <- "Example_Format"
defined.layers
slotThis is a list of DGVMTools::Layer objects. If the Format is for a DGVM, these Layers should correspond to the standard default PFTs, soil C or N pools or fluxes etc. in the DGVM. It is possible that if the Format is for a data set, the concept of Layers might not make sense so this can just be an empty list.
Layers_NewFormat <- list( # A couple of tree PFTs TeBE = new("Layer", id = "TeBE", name = "Temperate Broadleaved Evergreen Tree", colour = "darkgreen", properties = list(type = "PFT", growth.form = "Tree", leaf.form = "Broadleaved", phenology = "Evergreen", climate.zone = "Temperate") ), TeBS = new("Layer", id = "TeBS", name = "Temperate Broadleaved Summergreen Tree", colour = "darkolivegreen3", properties = list(type = "PFT", growth.form = "Tree", leaf.form = "Broadleaved", phenology = "Summergreen", climate.zone = "Temperate") ), # And a couple of grasses C3G = new("Layer", id = "C3G", name = "Boreal/Temperate Grass", colour = "lightgoldenrod1", properties = list(type = "PFT", growth.form = "Grass", leaf.form = "Broadleaved", phenology = "GrassPhenology", climate.zone = "NA") ), C4G = new("Layer", id = "C4G", name = "Tropical Grass", colour = "sienna2", properties = list(type = "PFT", growth.form = "Grass", leaf.form = "Broadleaved", phenology = "GrassPhenology", climate.zone = "NA") ) ) # Now take a look at them for(Layer in Layers_NewFormat) { print(Layer) }
quantities
slotThis is a list of DGVMTools::Quantity objects. These correspond to the variables that might be in the dataset/model output that one might want to read. For example LAI, biomass, evapotranspiation, GPP/NPP, soil water content. Note that they don't need to be there in every run, (and you don't need to define all variables straight away), but if you want to use them in DGVMTools you need to define them.
quantities_NewFormat <- list( # Just a couple of example quantities new("Quantity", id = "LAI", name = "LAI", units = "m^2/m^2", colours = function(n) rev(viridis::viridis(n)), format = c("Example_Format"), standard_name = "leaf_area_index"), new("Quantity", id = "Veg_C", name = "Vegetation Carbon Mass", units = "kgC/m^2", colours = viridis::viridis, format = c("Example_Format")) ) # Now take a look at them for(quant in quantities_NewFormat ) { print(quant) }
determineQuantities
slotThis is a function which will look at a some output files on a disk (normally output from a DGVM run) and return a list of Quantities available to read. The first argument should be the Source object. The second argument "names", is logical and if TRUE, the function should return a list of the ids of the quantities, otherwise it returns the list of DGVMTools::Quantity objects. Additionally arguments for Format-specific details can also be added.
availableQuantities_NewFormat <- function(x, names = TRUE, additional.args){ # typical stuff # * get a list of files in the directory # * scan the files for different variables # * build a list of qunatities present in the model output / dataset # dummy code Quantities.present <- list() return(Quantities.present) }
getField
slotThis is the biggie! This function (which will be long, but can of course make use of other functions) is responsible for reading data from disk and bashing it into a data.table; for some cropping to the required spatial-temporal-annual extent (optionally) and for storing the spatial-temporal-annual extent in an STAInfo object. As arguments it takes:
It should return a DGVMTools::Field object. For more details of the Field class see the classes.R file and the main package documentatation ("Field-class)".
Note that the input "sta.info" argument and the sta.info slot of the returned Field don't need to match, the rest of the the cropping/aggregating is done automatically by the package, but it maybe more efficient to read only a subset of the data from disk (eg. for netCDF files where you can easily cut out exactly the 'hyperslab' of data that you need), rather than read the whole file and then crop it.
getField_NewFormat <- function(source, quant, sta.info, verbose, ...){ # open the file and read the data with ncvar_get() or read.table() or whatever # code needs to get the data as a data.table so reformat it # data.tables should be melted down to one column for Lat, Lon, Year, Month and Day (as appropriate). # days run from 1-365, and if data is daily no Month column should be used # dummy code dt <- data.table() # also an STAInfo object # define this properly based on the actual data in the data.table return.sta.info <- new("STAInfo", first.year = 1901, # let's say last.year = 2018, # let's say year.aggregate.method = "none", # see NOTE 1 below spatial.extent = extent(dt), # raster::extent function has been define for data.tables, might be useful spatial.extent.id = "Full", # see NOTE 2 below spatial.aggregate.method = "none", # see NOTE 1 below subannual.resolution = "monthly", # let's say subannual.aggregate.method = "none", # see NOTE 1 below subannual.original = "monthly" # let's say ) # NOTE 1: normally there is no reason to do any aggregation in this function since it will be done later (and doesn't save any disk reading) # NOTE 2: if no spatial cropping at this stage set this to be the character string "Full", if spatial cropping was done, set it to the spatail.extent.id argument of the target sta.info object # make the Field ID based on the STAInfo and field.id <- makeFieldID(source = source, var.string = quant@id, sta.info = sta.info) # build a new Field return.Field <- new("Field", id = field.id, quant = quant, data = dt, sta.info = return.sta.info, source = source) return(return.field) }
Finally we can combine the six parts that we just made to make the new Format
object,
NewFormat <- new("Format", id = id_NewFormat, predefined.layers = Layers_NewFormat, quantities = quantities_NewFormat, availableQuantities = availableQuantities_NewFormat, getField =getField_NewFormat)
And there we go. A new Format
object. Obviously the trickness is defining the functions, particularly getField()
{r Final print, echo=TRUE}
print(NewFormat)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.