knitr::opts_chunk$set(echo = TRUE)

About this tutorial

In this vignette you will learn how to use the VegX R package to map, integrate and harmonize vegetation data using the Veg-X standard (v. 2.0). For the examples, we use the data sets provided in the package. If you do not know what the Veg-X standard is, please refer to vignette The Veg-X exchange standard. Here we refer to elements of the Veg-X standard. Readers should refer to the same vignette to understand the definition of these elements and their logical relationships.

Package users and their main interests

We envisage two different kinds of users of the VegX package:

Installing the package and loading source data

The VegX package is currently distributed from GitHub. To install it, you should have package devtools installed and use the following command: devtools::install_github("iavs-org/VegX", build_vignettes=TRUE). Assuming that the package is already installed you begin by loading it, which results in the required package XML also being loaded:

library(VegX)

User's of the VegX package are expected to know how import their source data into R, either using a database connection or by reading files in diverse formats (e.g. txt, csv, xlsx, ...). For the examples of this manual, we will use three data sets that were extracted from the New Zealand National Vegetation Survey (NVS) Databank. These are subsets of the original datasets prepared for demonstration purposes only.

Each of the three data sets contains different tables, corresponding to plot location, site observations, taxon observations, ... For simplicity, we reduced the number of plots in each example data set to five, although some data sets contain subplots. As the three data sets are included with the package, we load the data from the three data sets into the R workspace using:

data(mokihinui)
data(mtfyffe)
data(takitimu)
ls()

Creating a new Veg-X document

Before mapping any data to the Veg-X standard, we need to create a new (empty) document for each data set, using newVegX():

moki_vegx = newVegX()
mtfyffe_vegx = newVegX()
taki_vegx = newVegX()

The output from print() command reveals that a Veg-X document is defined in R using a S4 class, each of the different slots being vectors of the main elements of the Veg-X document:

print(moki_vegx)

Printing a Veg-X object will normally result in too much data being shown in the console output. More user-friendly information about the Veg-X object can be obtained using the function summary(), which tell us how many instances we have of each of the main elements:

summary(moki_vegx)

Of course, moki_vegx is now empty (as are the other two VegX objects).

In the following sections we will progressively add content to the VegX objects. When using the VegX package, the order in which we introduce data to VegX documents is not particularly important, as elements are created as needed. Nevertheless, we will introduce the different functions that add data following a logical sequence. Thus, we begin by introducing plot and survey information, followed by observations of individual organisms, taxa, strata, etc. Later sections of the manual deal with functions that facilitate data integration and harmonization.

Adding plot and survey data to Veg-X documents

| Function | Description | | --------------- | --------------------------------------------------------------- | | addPlotObservation() | Adds plot observation records to a VegX object from a data table where rows are plot observations. | | addPlotLocations() | Adds/replaces static plot location information (spatial coordinates, elevation, place names, ...) to plot elements of a VegX object. | | addPlotGeometries() | Adds/replaces static plot geometry information (plot shape, dimensions, ...) to plot elements of a VegX object. | | addSiteCharacteristics() | Adds/replaces static site characteristics (topography, geology, ...) to plot elements of a VegX object. | | fillProjectInformation() | Fills the information for a given research project. |

Project, plot and observation dates

In this subsection we show how to introduce information about plot names and survey dates. We start with the Mokinihui forest data set, by inspecting the data in the data frame moki_site:

head(moki_site, 3)

The data frame has many columns, encompasing both plot shape, site characteirstics, experimental treamtments, etc. The most important columns to parse in the beginning are Plot, Subplot and PlotObsStartDate, because these specify the space and time context of the vegetation observations. Other columns specify identifiers (IDs), but these are specific to the source data base. As Veg-X documents have their own internal IDs, it is not necessary to import the source identifiers.

To import data into Veg-X documents, we almost always need a mapping between the names of elements in the Veg-X standard and the names of columns in the data table used as input. For example, in the following code we define that column "Project" in the source data table contains the information about the projectTitle in Veg-X, column "Plot" contains the information about the plotName element, and so on:

mapping = list(projectTitle = "Project", plotName = "Plot", subPlotName = "Subplot",
               obsStartDate = "PlotObsStartDate", obsEndDate = "PlotObsStopDate")

Once the mapping is defined, we can import the data using addPlotObservations():

moki_vegx = addPlotObservations(moki_vegx, moki_site, mapping = mapping)

The console output of the add function informs us of the steps that took place and the modifications of our Veg-X R object (note that we could store the result in a different object instead of replacing moki_vegx). 25 plots were identified, all belonging to the same research project, and one plot observation was read for each plot. If we again call the summary function we will see a change in the number of data elements:

summary(moki_vegx)

Note that among the 25 plots there are 20 sub-plots (i.e. 4 quadrants for each parent plot). If we want to inspect, at any time, the content of a Veg-X object in more detail, we can use the function showElementTable(), indicating which of the main Veg-X elements we want to inspect:

head(showElementTable(moki_vegx, "plotObservation"),6)

When sub-plots are added to a VegX object, the package automatically names them by concatenating the name of the plot with the name of the subplot, with an underscore '_' to separate both strings.

Let's now read plots and plot observations for the Mt Fyffe forest data set. Since it comes from the same vegetation data base (NVS), the data tables have similar column names and we will not show them again. In this case, however, there is no information about the sampling end date, only the start date. We modify our mapping accordingly and we call addPlotObservations() :

mapping = list(projectTitle = "Project", plotName = "Plot", subPlotName = "Subplot", 
               obsStartDate = "PlotObsStartDate")
mtfyffe_vegx = addPlotObservations(mtfyffe_vegx, mtfyffe_site, mapping)
summary(mtfyffe_vegx)

In this source data set there are many more sub-plots for each parent plot, and each plot was visited twice (in 1980 and the austral summer of 2007-2008). Moreover, each survey corresponds to a different project. Veg-X does not require projects to be equated to surveys, but this data set is structured this way. We now turn our attention to the Takitimu grassland data set.

mapping = list(projectTitle = "Project", plotName = "Plot", subPlotName = "Subplot", 
               obsStartDate = "PlotObsStartDate")
taki_vegx = addPlotObservations(taki_vegx, taki_site, mapping)
summary(taki_vegx)

According to this summary, this third data set contains again 5 plots, but with no sub-plots, so even though we specified a mapping for sub-plots, there were no sub-plots in the source data table to populate the VegX object.

Project information

In the previous subsection, we specified a mapping for research project titles, and this lead to the creation of project elements in Veg-X documents. However, we did not introduce any data describing the project:

showElementTable(moki_vegx, "project")

Veg-X package provides the function fillProjectInformation() to fill project data. It can be used to fill the data for an existing project (identified by its title) or to define a new project. In this case the data is introduced directly as text to the parameters of the function, instead of being read from a data frame. As an example, we provide the information for the project that led to the collection of data in the Mokihinui forest:

moki_vegx = fillProjectInformation(moki_vegx, "MOKIHINUI HYDRO PROPOSAL - LOWER GORGE 2011",
              personnel = c(contributor = "Susan K. Wiser"),
              abstract = paste("Characterise the forest and riparian vegetation",
                               "in the lower Mokihinui gorge,",
                               "and compare this with the vegetation",
                               "in (a) North Branch gorge of Mokihinui",
                               "and (b) Karamea catchment."),
             studyAreaDescription = paste("Mokihinui and Karamea catchments.",
                                          " Forest riparian habitat."))

showElementTable(moki_vegx, "project")

Note that filling the information about the project led to the definition of personnel involved in the project. In the Veg-X standard any individual/organization/position involved in the creation of a data set is stored in a party element. We may fill contact information for party elements using the function fillPartyInformation().

Plot coordinates

The next piece of information we will introduce are the geographic locations of plots (sampling dates were already mapped with addPlotObservations()). We thus take a look at moki_loc data frame:

head(moki_loc, 3)

Locations are expressed using different coordinate systems, but the easiest and more common way of exchanging geographic information is by using latitude and longitude. Hence, we define a new mapping and use the function addPlotLocations():

mapping = list(plotName = "Plot", x = "Longitude", y = "Latitude")
moki_vegx = addPlotLocations(moki_vegx, moki_loc, mapping, 
                             proj4string = "+proj=longlat +datum=WGS84")

When defining the mapping xand yare used to map coordinates. We should also include plotName, because otherwise the function does not know how to match coordinates with the plots already defined in moki_vegx (subPlotName should be included if coordinates are available for subplots). Parameter proj4string is used to supply the spatial reference system of the coordinates. The console output indicates that no new plots have been added (they were previously defined), but they would if we had started populating an empty Veg-X object using addPlotLocations(). We can inspect the data recently entered using the following command:

head(showElementTable(moki_vegx, "plot"),3)

When calling showElementTable() for plot elements we are showing the plot/sub-plot relationships. Note that sub-plots have no explicit coordinates associated to them (they are not given in moki_loc). It is up to the user to provide them in the source data. Using the same mapping we can parse the coordinates of the Mt Fyffe forest data set:

mtfyffe_vegx = addPlotLocations(mtfyffe_vegx, mtfyffe_loc, mapping)

In this example, 8 records were parsed, but coordinates are availble for four plots only. Coordinate records are duplicated in mtfyffe_loc, because they are provided independently for each survey. The function addPlotLocations() will only keep the most recently read location records of each plot. Finally, we parse plot coordinates for the Takikimu grassland data set, realizing that they are missing for three of the plots.

taki_vegx = addPlotLocations(taki_vegx, taki_loc, mapping)

The function addPlotLocations() accepts coordinates in any spatial reference system (which is specified using the parameter proj4string). Setting toWGS84 = TRUE will indicate to the function that it should attempt to translate the input coordinates into longitude and latitude, but this was not required in our examples.

Plot elevation

While x and y specify horizontal plot position, the vertical position of a plot is specified using elevation (normally above sea level). Since plot elevation is a measurement, it is important to specify a measurement method (i.e. instruments) and a measurement scale (i.e. measurement units) because this metadata decreases potential errors when pooling data from different sources. In the Veg-X standard, this information is specified via defining method and attribute elements, whereas the VegX package has a S4 class named VegXMethod that encapsulates both things. Users can define their own methods, but the package provides function predefinedMeasurementMethod() to easily define methods for the most common variables. For example, we can define the measurement for elevation in meters above sea level using:

elevMethod = predefinedMeasurementMethod("Elevation/m")

Plot elevation is added to Veg-X documents using addPlotLocations() as before. However, in our Mokihinui data set elevation is included in the data frame moki_site (and not moki_loc), so we could not add it using the same call that we used for plot coordinates. Having our elevation method defined, we use again:

mapping = list(plotName = "Plot", elevation = "Altitude")
moki_vegx = addPlotLocations(moki_vegx, moki_site, mapping, 
                             methods = list(elevation = elevMethod))

Only the parent plots have elevation data (i.e., the records of sub-plots are missing). If we inspect again the plot elements of our document we find that elevation data has been added to plot coordinates:

head(showElementTable(moki_vegx, "plot"),3)

Analogous calls to addPlotLocations() can be made to fill elevation data for the Mt Fyffe forest and the Takitimu grassland data sets:

mtfyffe_vegx = addPlotLocations(mtfyffe_vegx, mtfyffe_site, mapping,
                                methods = c(elevation = elevMethod))
taki_vegx = addPlotLocations(taki_vegx, taki_site, mapping, 
                             methods = list(elevation = "Elevation/m"))

Note that in this case we specified the method for elevation using a string directly. This avoids having to call function predefinedMeasurementMethod().

Plot geometry

By plot geometry, we refer to plot area, shape and dimensions. Veg-X allows different plot shapes (circle, rectangle, line or polygon), and each plot shape implies different dimensions. Plot geometry is specified using function addPlotGeometries() and, analogously to addPlotLocation(), the function will replace any previous information regarding geometry. We start by looking at the plot geometry fields in the Mokihinui forest data set table moki_site:

names(moki_site)
table(moki_site$Shape)

After realizing that plot/subplot shapes are rectangular and both length and width are available, we define the following mapping for rectangular (or square) plots:

mapping = list(plotName = "Plot", subPlotName = "Subplot",
               area = "PlotArea", shape = "Shape",
               length = "PlotRectangleLength01", width = "PlotRectangleLength02")

Like elevation, plot area and plot dimensions are measurements so we need to define them. We are now ready to import plot geometry using addPlotGeometries(), where we specify both the mapping and the list of methods corresponding to the Veg-X element names of the mapping (i.e. area, length and width for rectangular plots):

moki_vegx = addPlotGeometries(moki_vegx, moki_site, mapping,
              list(area = "Plot area/m2", width = "Plot dimension/m", length = "Plot dimension/m"))
head(showElementTable(moki_vegx, "plot"),3)

Like before, no new plots were added, as previous call functions had already defined them. In the call to showElementTable() we have now the plot geometry added to the plot location and plot/subplot relationships. Importing plot geometry for the Takitimu grassland data set is analogous:

taki_vegx = addPlotGeometries(taki_vegx, taki_site, mapping,
              list(area = "Plot area/m2", width = "Plot dimension/m", length = "Plot dimension/m"))
head(showElementTable(taki_vegx, "plot"))

In the case of Mt Fyffe forest data set, shape is missing for many of plots. Other plots are circular but radius is not defined in the mtfyffe_site data frame.

names(mtfyffe_site)
table(mtfyffe_site$Shape)

Hence, we define the following mapping, and a call addPlotGeometries() produces the following result:

mapping = list(plotName = "Plot", subPlotName = "Subplot",
               area = "PlotArea", shape = "Shape")
mtfyffe_vegx = addPlotGeometries(mtfyffe_vegx, mtfyffe_site, mapping,
                                 list(area = "Plot area/m2"))
head(showElementTable(mtfyffe_vegx, "plot"), 3)

Other static site characteristics

To finish with static plot information, the next data we should add to our Veg-X documents is plot topography. This can be done using function addSiteCharacteristics(), which also allows introducing other site attributes that are considered static in time for the time scales of vegetation dynamics (e.g. geological parent material). Having inspected data frame moki_site before makes us suspect that an appropriate mapping is:

sitemapping = list(plotName = "Plot", subPlotName = "Subplot",
                   slope = "PlotSlope", aspect = "PlotAspect")

Since slope and aspect are again measurements, we also need to provide methods for them. After checking the units in the source data we are ready to import the data:

moki_vegx = addSiteCharacteristics(moki_vegx, moki_site, mapping = sitemapping,
                measurementMethods = list(slope = "Slope/degrees", aspect = "Aspect/degrees"))

head(showElementTable(moki_vegx, "plot"), 3)

Again, no new plots are added, and missing values correspond to subplots. When calling showElementTable() the topography information is shown along with the plot information previously added. Since the site data frames for Mt Fyffe forest and Takitimu grassland data sets have the same structure as that of Mokihinui, adding topography information for the former data sets is rather straightforward:

mtfyffe_vegx = addSiteCharacteristics(mtfyffe_vegx, mtfyffe_site, mapping = sitemapping,
                measurementMethods = list(slope = "Slope/degrees", aspect = "Aspect/degrees"))
taki_vegx = addSiteCharacteristics(taki_vegx, taki_site, mapping = sitemapping,
                measurementMethods = list(slope = "Slope/degrees", aspect = "Aspect/degrees"))

Adding observation data

In the beginning of the previous section we specified plot observation dates for the plots of our examples, using function addPlotObservation(). While this function defines survey events for plots, it does not add any observation or measurement made on plot visits. In this section we show how to add such information.

| Function | Description | | --------------- | --------------------------------------------------------------- | | addIndividualOrganismObservations() | Adds individual organism observation records (e.g. tree diameters or heights) to a VegX object. | | addAggregateOrganismObservations() | Adds aggregate organism observation records (e.g. % cover of a particular taxon) to a VegX object. | | addStratumObservations() | Adds stratum observation records (e.g. % cover of plants in the tree layer) to a VegX object. | | addCommunityObservations() | Adds community observation records (e.g. stand age or total basal area) to a VegX object. | | addSiteObservations() | Adds site observation records (e.g. abiotic measurements such as pH) to a VegX object. | | addSurfaceCoverObservations() | Adds surface cover observation records (e.g. percent of ground covered by bare soil or rocks) to a VegX object. |

Individual organism observations

First we focus on observations made on individual organisms (e.g. diameter values measured on individual trees). Since individual organisms can be labelled and re-measured in different plot surveys, Veg-X uses the element individualOrganism to keep track of the organism itself. Then, different elements individualOrganismObservations can be used to contain measurements made on the individual organism each time there was an observation of the plot (i.e. each time the plot was revisited). The individual organism (e.g. a particular tree) is uniquely identified using the plot name and an organism label (i.e. a tag on the specimen). Thus, the same label can be repeated in different plots without causing data integrity problems. Individual organisms and their observations are added to Veg-X using the function addIndividualOrganismObservations(). We first show how it works using the data frame moki_dia, which contains diameter measurements for trees in the Mokihinui forest data set:

head(moki_dia, 3)
unique(moki_dia$Identifier)

Note that there is a column called Identifier but no data in it. Fortunately, the data set includes a single survey, so that there is no need to provide labels for individual organisms. Hence, we can define our mapping as follows:

mapping = list(plotName = "Plot", subPlotName = "Subplot", obsStartDate = "PlotObsStartDate",
               taxonName = "NVSSpeciesName", diameterMeasurement = "Diameter")

If no mapping is provided for individualOrganismLabel, function addIndividualOrganismObservations() will assume that each record corresponds to a different organism. To define the identity of organisms we can use mapping for either organismName or taxonName. The first option is used to specify names that are not taxa (e.g. "tree #1", "tree #2", or morphospecies), while the second option explicitly identifies names as taxa. The call to the function produces the following output:

moki_vegx = addIndividualOrganismObservations(moki_vegx, moki_dia, mapping = mapping,
                                      methods = list(diameterMeasurement = "DBH/cm"))

where we see that the number of individual organisms is equal to the number of observations. We can inspect the added individual organism observations using:

head(showElementTable(moki_vegx, "individualOrganismObservation"), 3)

Note that the column individualOrganismLabel contains labels created by the function itself, by numbering all individuals of each plot. The call to function addIndividualOrganismObservations() also led to the definition of elements organismName (used to store the different organism/taxon names that are used in the Veg-X document) and elements organismIdentity (which define the identity of organisms, as with links to organism names and taxon concepts). Let's inspect the latter:

head(showElementTable(moki_vegx, "organismIdentity"), 3)

In this case, the identity is simply the species name coming from the source data, but it could be another name considered nomenclaturally more valid for the same species.

The Mt Fyffe forest data set also contains tree diameter measurements, but in this case there have been two surveys, so in order to add individual tree observations we need the mapping individualOrganismLabel to specify which column identifies each tree in each plot:

head(mtfyffe_dia, 3)
mapping = list(plotName = "Plot", subPlotName = "Subplot", obsStartDate = "PlotObsStartDate",
               taxonName = "NVSSpeciesName", individualOrganismLabel = "Identifier", 
               diameterMeasurement = "Diameter")

Since the diameter measurement method is the same as before, we can directly run `addIndividualOrganismObservations()`` and inspect the result:

mtfyffe_vegx = addIndividualOrganismObservations(mtfyffe_vegx, mtfyffe_dia, 
                                  mapping = mapping,
                                  methods = list(diameterMeasurement = "DBH/cm"))

head(showElementTable(mtfyffe_vegx, "individualOrganismObservation"), 3)

Note that in this case the number of observations of individual trees is higher than the number of trees, because of the repeated measurements. Although we will not show it here in any example, it is possible to associate organism observations to particular heights where organisms are observed or to particular strata, by linking them to stratum observations in the same way as we did for aggregate organism observations.

Aggregate organism observations

Aggregate organism observations include measurements that apply to a set of organisms collectively, normally all organisms of the same species identity. The most common examples are abundance values (e.g. cover) for species. Function addAggregateOrganismObservations() can be used to import such data into a VegX document. We first inspect the Mokihinui forest data frame moki_tcv to decide what information should be mapped:

head(moki_tcv,3)

As before, taxon names can be drawn from column NVSSpeciesName. Column Tier contains information about the stratum where species were recorded, whereas column Category contains cover values codified in a cover ordinal scale. First, we define a mapping for these variables as well as for plot and observation start date, which together specify a plotObservation (aggregate organism observations were not done in subplots for this data set).

mapping = list(plotName = "Plot", obsStartDate = "PlotObsStartDate", 
               taxonName = "NVSSpeciesName",
               stratumName = "Tier", cover = "Category")

In order to parse cover values, we could use a method of percent cover, but in this data set cover is specified using cover classes. Thus, we need to define an ordinal scale that can be used to interpret Category values; this can be done with function defineOrdinalScaleMethod():

coverscale = defineOrdinalScaleMethod(name = "Recce cover scale",
                   description = "Recce recording method by Hurst/Allen",
                   subject = "plant cover",
                   citation = "Hurst, JM and Allen, RB. (2007) The Recce method for describing 
                               Zealand vegetation – Field protocols. Landcare Research, Lincoln.",
                   codes = c("P","1","2","3", "4", "5", "6"),
                   quantifiableCodes = c("1","2","3", "4", "5", "6"),
                   breaks = c(0, 1, 5, 25, 50, 75, 100),
                   midPoints = c(0.05, 0.5, 15, 37.5, 62.5, 87.5),
                   definitions = c("Presence", "<1%", "1-5%","6-25%", "26-50%", 
                                   "51-75%", "76-100%"))

As the source data specifies taxon abundances within vegetation strata, we also need to supply information on how the strata are defined. The VegX R package provides three different ways of defining strata: by heights, by categories and using a mixed approach. This last option is used in the following code:

moki_strataDef = defineMixedStrata(name = "Recce strata",
                   description = "Standard Recce stratum definition",
                   citation = "Hurst, JM and Allen, RB. (2007) The Recce method for describing 
                               Zealand vegetation – Field protocols. Landcare Research, Lincoln.",
                   heightStrataBreaks = c(0, 0.3,2.0,5, 12, 25, 50),
                   heightStrataNames = paste0("Tier ",1:6),
                   categoryStrataNames = "Tier 7",
                   categoryStrataDefinition = "Epiphytes")

Having the mapping, the cover scale and the stratum definition we can proceed to import species cover values by strata using function addAggregateOrganismObservations():

moki_vegx = addAggregateOrganismObservations(moki_vegx, moki_tcv, mapping,
                        methods = list(cover=coverscale),
                        stratumDefinition = moki_strataDef)

Note that the both the stratum definition and the cover scale contain methods that are added to the Veg-X document. The strata themselves are also added to the document (i.e. stratum elements). Other elements that are added are organism identities (i.e. taxon names), stratum observations (because species were observed while focusing on particular strata) and, finally, aggregate organism observation themselves. Less organism names and organism names have been added than those parsed, because the Veg-X document already contained some from individual organism observations. We can inspect the newly added taxon cover observations using:

head(showElementTable(moki_vegx, "aggregateOrganismObservation"),3)

The Mt Fyffe forest data set includes individual counts by species and stratum (i.e. another kind of aggregate organism observations) in a data frame mtfyffe_counts, which has a similar structure as moki_tcv, but with counts being in column value:

head(mtfyffe_counts, 3)
mapping = list(plotName = "Plot", subPlotName = "Subplot", obsStartDate = "PlotObsStartDate", 
               taxonName = "NVSSpeciesName", stratumName = "Tier", counts = "Value")

Analogously to the previous case, we need to specify a measurement method for counts, and in this case we can use function predefinedMeasurementMethod():

countscale = predefinedMeasurementMethod("Individual plant counts")

Then we also need to provide the strata definition, which is different from that of the previous data set. Here all strata are defined by height, so we can use a function called defineHeightStrata():

mtfyffe_strataDef = defineHeightStrata(name = "Standard seedling/sapling strata",
                              description = "Seedling/sapling stratum definition",
                              heightBreaks = c(0, 15, 45, 75, 105, 135, 200),
                              strataNames = as.character(1:6),
                              strataDefinitions = c("0-15 cm", "16-45 cm", "46-75 cm", 
                                                    "76-105 cm", "106-135 cm", "> 135 cm"))

Now, we are ready to import the data:

mtfyffe_vegx = addAggregateOrganismObservations(mtfyffe_vegx, mtfyffe_counts, mapping,
                        methods = list(counts=countscale),
                        stratumDefinition = mtfyffe_strataDef)
head(showElementTable(mtfyffe_vegx, "aggregateOrganismObservation"),3)

Again, this involves that elements of several kinds are added to our Veg-X document. The process for the Takitimu grassland data set is similar, but in this case, the observations are not organized by strata, and as abundance values we have frequency of occurrence.

head(taki_freq, 3)
mapping = list(plotName = "Plot", obsStartDate = "PlotObsStartDate", 
               taxonName = "NVSSpeciesName", freq = "Value")

Hence, we define the new measurement scale and call again addAggregateOrganismObservations():

taki_vegx = addAggregateOrganismObservations(taki_vegx, taki_freq, mapping,
                        methods = list(freq="Plant frequency/%"))
head(showElementTable(taki_vegx, "aggregateOrganismObservation"), 3)

As expected, no stratum definition nor stratum observations are added to the Veg-X document, but we still see the addition of organism names, organism identities and aggregate organism observations.

While aggregated organism observations are often related to strata, it is possible to indicate that measurements of cover of counts were done focusing on a particular height, by mapping to heightMeasurement instead of using stratumName.

Stratum observations

In the previous subsections we stated that both individual and aggregate organism observations can be positioned in a particular vegetation stratum (e.g. the moss layer). However, one could imagine measurements that apply to the stratum itself, like the overall cover or basal area of all organisms in the stratum, regardless of their identity. Other common stratum measurements are those that define its vertical limits (e.g. at which height did the tree layer started?). Veg-X allows storing this information in elements stratumObservation. We showed that of this kind these were automatically created and added when dealing with aggregate taxon observations, but here we show how to add measurements that specifically refer to strata using function addStratumObservations().

To illustrate how to add stratum observations to a Veg-X document, we take again the Mokihinui forest data set as data source and inspect the data frame moki_str, which contains strata cover measurements:

head(moki_str, 3)

The data table also contains stratum height limits, although our definition of strata to import taxon cover data already contained height limits for most strata. We will assume that the data in moki_str indeed contains actual measurements and define the mapping accordingly:

mapping = list(plotName = "Plot", obsStartDate = "PlotObsStartDate", stratumName = "Tier",
               lowerLimitMeasurement = "TierLower", upperLimitMeasurement = "TierUpper",
               cover = "CoverClass")

Both the cover ordinal scale and the strata definitions have been used before, so we do not need to redefine them. We do need, however, to create a definition of the method applying to height measurements, before calling addStratumObservations():

heightMethod = predefinedMeasurementMethod("Stratum height/m")

moki_vegx = addStratumObservations(moki_vegx, moki_str, mapping = mapping,
                        methods = list(lowerLimitMeasurement = heightMethod,
                                       upperLimitMeasurement = heightMethod,
                                       cover=coverscale),
                        stratumDefinition = moki_strataDef)

Note that no new strata definitions are added, as they were already included when adding aggregate stratum observations. We do have some new stratum observations. The status of the stratum observations can be shown using:

head(showElementTable(moki_vegx, "stratumObservation"), 3)

Community observations

Veg-X includes into elements communityObservation all biotic observations and measurements that are naturally defined at the plant community (or vegetation stand) level, such as basal area, species richness or stand age. Since our example source data sets did not include any of such measurements, we start by adding a column BA with simulated basal area values in the moki_site data frame using a Normal distribution:

moki_site$BA = pmax(0, rnorm(nrow(moki_site), 10, 5))

Adding community observations requires, as usual, a mapping where in addition to mapping plots and surveys we can specify mappings for measurements defined at the community level:

# Define mapping
mapping = list(plotName = "Plot", subPlotName = "Subplot",
               obsStartDate = "PlotObsStartDate", basal_area = "BA")

Of course, for each measurement we will need to provide a method that describes the measured subject, units, etc. Function addCommunityObservations() is used to add community observations to a VegX object:

# Add basal area measurements to the VegX object
moki_vegx = addCommunityObservations(moki_vegx, moki_site, mapping = mapping,
                        methods = list(basal_area = "basal area"))
# Inspect the result
head(showElementTable(moki_vegx, "communityObservation"),3)

Site observations

Veg-X includes into elements siteObservation all observations and measurements that do not refer to vegetation itself, i.e. abiotic measurements, soil type classifications, etc. Since our example source data sets did not include any of such measurements, we created a column pH with constant values in the moki_site data frame. The function that allows adding site observations is addSiteObservations() and the following code should be rather self-explanatory by now:

mapping = list(plotName = "Plot", subPlotName = "Subplot", obsStartDate = "PlotObsStartDate")
moki_vegx = addSiteObservations(moki_vegx, moki_site,
                         plotObservationMapping = mapping,
                         soilMeasurementMapping = list(a = "pH"),
                         soilMeasurementMethods = list(a = "pH/0-14"))

In contrast with other add... functions, 'a' is only used here in the context of of the addSiteObservations() function (i.e., there will be no variable called 'a' in the Veg-X document). When displaying site observations, columns soil_1_*, soil_2_* only indicate the numbering of soil variables:

head(showElementTable(moki_vegx, "siteObservation"))

It is important to distinguish the subject of a method from the method itself. For example, subject would be pH measurement of upper soil solution, whereas a particular methods for this subject would be the measurement in water or measurement in 0.01 mol CaCl. In the former example we added variable 'pH' of the input data to the VegX document and defined the measurement method as pH/0-14, which simply specifies the measurement of pH (the subject) onto a 0-14 scale. Let's look at its definition:

predefinedMeasurementMethod("pH/0-14")

Surface cover observations

Surface cover observations are measurements of the percentage of the plot's surface that is covered (i.e. when projected onto the ground) by different surface types, such as rocks, bare soil, vegetation, etc. Veg-X allows defining surface types as surfaceType elements, and storing cover values for them in surfaceCoverObservation elements. We use the Mt Fyffe forest data set to illustrate how this kind of observations are added to a Veg-X document. First we inspect table mtfyffe_groundcover and define a mapping:

head(mtfyffe_groundcover, 3)
mapping = list(plotName = "Plot", obsStartDate = "PlotObsStartDate",
               surfaceName = "PlotGroundCover", coverMeasurement = "Value")

In this case, cover values are specified as percent cover of ground surface, so we need to define an appropriate method:

coverMethod = predefinedMeasurementMethod("Surface cover/%")

We inspect the surface types used in the data set and call function defineSurfaceTypes() as we have done previously for strata:

unique(mtfyffe_groundcover$PlotGroundCover)
surfaceTypes = defineSurfaceTypes(name = "Default surface types",
                     description = "Five surface categories",
                     surfaceNames = c("Vegetation", "Moss", "Litter", "Exposed Soil", 
                                      "Rock"))

We can now import surface cover observations using function addSurfaceCoverObservations():

mtfyffe_vegx = addSurfaceCoverObservations(mtfyffe_vegx, mtfyffe_groundcover, mapping,
                                coverMethod, surfaceTypes)

head(showElementTable(mtfyffe_vegx, "surfaceCoverObservation", 3))

Analogously to the case of strata, the function added surface type definitions to the Veg-X document, in addition to adding the cover values, themselves.

The Takitimu grassland data set also includes surface cover observations, although the surface types are slightly different:

head(taki_groundcover, 3)
unique(taki_groundcover$PlotGroundCover)

Therefore, we must define a new set of surface types before calling function addSurfaceCoverObservations():

surfaceTypes = defineSurfaceTypes(name = "Default surface types",
                     description = "Five surface categories",
                     surfaceNames = c("Vegetation", "Soil", "Erosion Pavement", "Litter",
                                      "Rock"))

taki_vegx = addSurfaceCoverObservations(taki_vegx, taki_groundcover, mapping,
                                coverMethod, surfaceTypes)

Combining and harmonizing Veg-X documents

One of the purposes of importing data into Veg-X, is the possibility to combine and harmonize documents from different sources. In this section we illustrate how documents should be merged, and some functions that can be used to harmonize their contents.

Adding unique identifiers

When combining VegX objects from different sources it is important to pay attention to plot names, because plots from two different sources may have been given the same name while in fact they correspond to different sampled areas. To combine two vegetation sources while avoiding confusion in plot identity one should use plot unique identifiers, i.e. sub-element plotUniqueIdentifier of plot. When populating a Veg-X object from a single source data set, unique identifiers are not normally available nor needed, and the functions that add observations to the object will only look at plotName to identifying plots uniquely. However, when merging VegX objects unique identifiers should be defined, and two plots should be considered to be the same only if both their plot name and unique identifier have the same values in both plots. While less critical than plot unique identifiers, the Veg-X standard also allows unique identifiers for plot observations, via the sub-element plotObservationUniqueIdentifier of plotObservation.

The VegX package provides two ways to supply unique identifiers. The function addPlotObservations() allows specifying mappings for both plotUniqueIdentifier and plotObservationUniqueIdentifier:

mapping = list(projectTitle = "Project", plotName = "Plot", subPlotName = "Subplot",
               obsStartDate = "PlotObsStartDate", obsEndDate = "PlotObsStopDate",
               plotUniqueIdentifier = "PlotID", plotObservationUniqueIdentifier = "PlotObsID")
vegx_ids = addPlotObservations(newVegX(), moki_site, mapping = mapping, verbose = FALSE)

head(showElementTable(vegx_ids, "plot"), 3)
head(showElementTable(vegx_ids, "plotObservation"), 3)

We could use function addPlotObservations() to define unique identifiers because these were available from our source data. Note however, that IDs coming from NVS are only unique within the context of this data bank. In cases the source data does not include unique identifiers or those available may not be unique in all situations, one can generate universally unique identifiers (or replace the current identifiers) using function fillUniqueIdentifiers():

moki_vegx = fillUniqueIdentifiers(target = moki_vegx, element = "plot")
head(showElementTable(moki_vegx, "plot"),3)

A UUID (Universal Unique Identifier) is a 128-bit number used to uniquely identify some object or entity. When generated according to the standard methods, UUIDs are for practical purposes unique, without depending for their uniqueness on a central registration authority or coordination between the parties generating them, unlike most other numbering schemes. While the probability that a UUID will be duplicated is not zero, it is close enough to zero to be negligible. If we are interested in merging different documents it is important to ensure that unique identifiers are defined for plots. Function fillUniqueIdentifiers() generates UUIDs by calling function UUIDgenerate() from the R package uuid.

As we did for Mokihinui VegX object, we generate universally unique identifiers for the other two VegX objects:

mtfyffe_vegx = fillUniqueIdentifiers(target = mtfyffe_vegx, element = "plot")
taki_vegx = fillUniqueIdentifiers(target = taki_vegx, element = "plot")

Updating taxon nomenclature

head(showElementTable(moki_vegx, "organismIdentity"),10)
moki_vegx = setPreferredTaxonNomenclature(moki_vegx, moki_lookup,
                   c(originalOrganismName = "NVSSpeciesName", preferredTaxonName = "PreferredSpeciesName"))
a = showElementTable(moki_vegx, "organismIdentity")
a[which(a$identityName!= a$originalOrganismName),]
mtfyffe_vegx = setPreferredTaxonNomenclature(mtfyffe_vegx, mtfyffe_lookup,
                   c(originalOrganismName = "NVSSpeciesName", preferredTaxonName = "PreferredSpeciesName"))
a = showElementTable(mtfyffe_vegx, "organismIdentity")
a[which(a$identityName!= a$originalOrganismName),]
taki_vegx = setPreferredTaxonNomenclature(taki_vegx, taki_lookup,
                   c(originalOrganismName = "NVSSpeciesName", preferredTaxonName = "PreferredSpeciesName"))
a = showElementTable(taki_vegx, "organismIdentity")
a[which(a$identityName!= a$originalOrganismName),]

Merging two Veg-X documents

Function mergeVegX() is used to merge two Veg-X documents into a single one. This function puts all the input elements into the same containers and, whenever elements are considered to be the same, they are merged. Each element kind has its own way to determine when two instances refer to the same entity. For example, two plots will be considered to be equal if they have the same plot name and, if defined, they have the same plotUniqueIdentifier. By default, however, plots and organism identities are not merged. This is a security measure to avoid plots with the same name but from different sources to be identified as equal. A call to mergeVegX() to merge the mtfyffe_vegx and taki_vegx VegX objects produces the following output:

comb_vegx = mergeVegX(moki_vegx, mtfyffe_vegx)

Plots were all kept separately because allowMergePlots = FALSE by default. However, in this case the plots had all different names and unique identifiers, so even if we had set allowMergePlots = TRUE, they would have been all kept separately. The objects to be merged had shared methods, so the function identifies them as equal and avoids repetitions. The decisions to merge (i.e. pool) information of other elements can be interpreted similarly. A special case concerns organismName vs. organismIdentity. Note that some organism names were merged, but identities were not. While merging equal names is always safe, merging identities should be done with extreme caution, because two data sets may have employed the same taxon name but with different taxon concepts. Therefore, by default mergeVegX() does not merge organism identities. If we want to specify that identities can be merged (when considered equal) we can set parameter allowMergeOrganismIdentities = TRUE:

# comb_vegx = mergeVegX(moki_vegx, mtfyffe_vegx, allowMergeOrganismIdentities = TRUE)

In this second output, there are the number of merges in organismName than organismIdentity, as the equal names have been forced to mean equal identity.

Note that function mergeVegX() can also be used to merge two documents that refer to the same data source, i.e. if one has imported different parts of the same source data into different Veg-X objects. In this case the user should specify allowMergePlots = TRUE.

Merging rules

Users of the VegX package should be aware that organism identities are by default kept separate when merging Veg-X objects. If the user chooses to merge identities, the decision to actually merge two given organism identities is complex, depending on both nomenclature and taxon concepts. Let '1' and '2' be two organism identities being compared. If original taxon concepts (i.e. element originalIdentificationConcept) are missing for both of them, the following table explains decision according to nomenclature and, in case of merging, the nomenclature of the resulting identity (asterisk indicates that a warning will be raised by the merging function):

Case | Orig.1 | Pref.1 | Orig.2 | Pref.2 | Merge | Orig.Res. | Pref.Res. | ---- | ------ | ------ | ------ | ------ | ----- | --------- | --------- | 1 | X | - | Y | - | No | - | - | 2 | X | - | X | - | Yes | X | - | 3 | X | Y | X | - | No | - | - | 4 | X | Y | Y | - | Yes | - | Y | 5 | X | Y | X | Z | No | - | - | 6 | X | Y | V | Z | No | - | - | 7 | X | Y | Z | Y | Yes | - | Y | 8 | X | Y | X | Y | Yes | X | Y | 9 | - | X | - | Y | No | - | - | 10 | - | X | - | X | Yes | - | X | 11 | X | - | - | X | Yes | - | X | 12 | X | - | - | Y | No | - | - |

In the more general case where original taxon concepts may have been specified for identity '1', identity '2' or both, the following table explains the decision to merge or not those organism identities, depending on the value of their taxon concepts and whether merging is possible according to nomenclature (asterisk indicates that a warning will be raised by the merging function):

Case | Nom. Merge? | Tax.Con.1 | Tax.Con.2 | Merge | Tax.Con. Res. | ---- | ----------- | --------- | ----------|------ | ------------- | 1 | No | - | - | No | - | 2 | No | X | - | No | - | 3 | No | X | Y | No | - | 4 | No | X | X | No | - | 5 | Yes | - | - | Yes | - | 6 | Yes | X | - | Yes | - | 7 | Yes | X | Y | No | - | 8 | Yes | X | X | Yes | X |

Adding observations to VegX objects that have identities defined

The general workflow when working with the VegX package is: (1) Create a new VegX document; (2) Add plot data and plot observations; (3) perform nomenclature corrections; (4) Merge documents. However, it could happen that user's attempt to add observations to a VegX object that already contains identities, possibly with nomenclatural revisions and associated taxon concepts. When adding organism observations to VegX objects, the add function assumes that the name supplied is an originalOrganismName and it applies the same rules explained to determine whether they refer to the same identity and, if not, then a new organism identity will be created. For example, even if there is a match between the supplied name and the original organism name of an existing identity, a new identity will be created if the original taxon concept has been asserted for the existing one, because there is no way to check that the two organisms involved have indeed the same identity. If existing entity does not have an original taxon concept, then the decision to create or not new organism identities will follow the nomenclature rules of the table above.

Transforming quantitative scales

heightMethod2 = predefinedMeasurementMethod("Stratum height/cm")
trans_vegx = transformQuantitativeScale(comb_vegx, "Stratum height/m", heightMethod2,
                               function(x){return(x*10)}, replaceValues = TRUE)
head(showElementTable(trans_vegx, "stratumObservation"),3)

Transforming ordinal scales

percentScale = predefinedMeasurementMethod("Plant cover/%")
trans_vegx = transformOrdinalScale(comb_vegx, "Recce cover scale", percentScale)
head(showElementTable(trans_vegx, "stratumObservation"),3)

head(showElementTable(, "organismIdentity"))

Writing and reading Veg-X documents

The Veg-X exchange standard is currently implemented as an XML schema (but other physical implementations of the standard could be possible). The VegX package provides functions writeVegX() and readVegX() that are used, respectively, to write and read XML files with Veg-X documents. An advantage of XML is that it is text that can be read and understood by humans, but its disadvantage is that files tend to be very large, because of the redundancy of text. One possibility to overcome this is to compress XML files (into zip or tar.gz files) for more efficient storage. However, compressing XML files does not avoid the problem that writing/reading XML files can be slow in large data sets.

An alternative to XML that users can employ is to directly save and read Veg-X documents as R objects, using functions saveRDS()and readRDS(). This option is fast and will produce much smaller files. The only drawback of saving R objects can arise if the S4 definition of Veg-X documents is changed in future versions of the package. We tried to avoid this potential problem by defining S4 Veg-X objects as lists of the main elements, without defining the internal structure of each main element. If the version of the standard is changed, functions to convert R objects from old to new versions of the standard should be made available to avoid losing backwards compatibility, in the same way that function readVegX() should be modified to allow reading XML documents formed following old versions of the standard.



miquelcaceres/VegX documentation built on Sept. 18, 2022, 7:04 p.m.