knitr::opts_chunk$set(out.width="100%", warning=FALSE, prompt=TRUE)
knitr::opts_chunk$set(cache=TRUE, fig.path="R_Figures/", cache.path="Cache/")
options(width=120)
library(darleq3)

1. Introduction

darleq3 is an R package for the assessment of river and lake ecological status using diatom data obtained by light microscopy (LM) or Next Generation Sequencing (NGS). The package contains functions to import diatom and associated environmental data from Excel worksheets, perform simple data validation checks, calculate various water quality metrics, EQRs and Water Framework Directive (WFD) quality classes for samples, and classification uncertainty for sites. The package can calculate Trophic Diatom Index TDI5LM, TDI4 and TDI3 scores for light microscopy river diatom samples, TDI5NGS for NGS river diatom samples, Lake Trophic Diatom Index LTDI2 and LTDI1 scores for light microscopy lake diatom samples, and Diatom Acidification Metric (DAM) scores for lake and river light microscopy samples. Details of the TDI / LTDI metrics, algorithm and derivation of the status class boundaries for rivers are given in Kelly et al. (2008) and for lakes in Bennion et al. (2014). Details of the DAM acidification metric is described in Juggins et al. (2016). Calculation of uncertainty of classification is described in Kelly et al. 2009. At the date of publication, formal WFD classification across the UK uses TDI5 LM for rivers and LTDI2 for lakes.

darleq3 can be run in two ways, either as an interactive shiny app, or a a series of R functions issues from the R console or an R script. The first method attempts to mimic the old DARLEQ2 software will be the easiest for most users. The second methods will be more convenient for processing multiple data sets, for automating darleq calculations, or including them in a longer chain of analysis.

2. Installation

The easiest way to install darleq3 is from a github repository. To do this first install the package devtools with the following command, omitting the prompt ("> "):

install.packages("devtools")

Then install darleq3. Note that this will also automatically install some additional packages on which darleq3 depends.

library(devtools)
install_github("nsj3/darleq3", build_vignettes=TRUE)

darleq3 also contains an example Excel data file. This can be made available in a R session with the following:

library(darleq3)
fn <- system.file("extdata/DARLEQ2TestData.xlsx", package="darleq3")

The file can be opened in Excel using the following command:

# note running the following lines will open the file in Excel (if installed)
shell.exec(fn)

An Excel version the DARLEQ3 taxon list can be opened with the following commands:

library(darleq3)
fn <- system.file("extdata/DarleqTaxonList2017_Master.xlsx", package="darleq3")
shell.exec(fn)

3. Using the darleq3 Shiny app

darleq3 can run on a remote Shiny server or locally on a desktop PC running RStudio. The app will function in exactly the same way in both situations. To run darleq3 on a remote shiny server open a web browser and point it to:

https://nsj3.shinyapps.io/darleq3/

This host has been set up for testing purposes and may change.

To run the app locally, simply start RStudio, load the darleq3 package and run the command runDARLEQ():

library(darleq3)
runDARLEQ()

This should open a browser and display the DARLEQ3 shiny app.

darleq3 shiny app

To use the app follow these simple steps:

To quit the app simple close the browser and or hit Escape in the RStudio Console window.

4. Using the darleq3 R package

darleq3 contains a number of functions for importing diatom data, calculating various sample and site-based metrics, EQRs and WFD quality classes, and saving the results in Excel format. The main functions are:

Type ?function_Name at the R prompt to get help and example usage for these functions.

The darleq3 functions have been designed to allow the user to perform individual steps of the data analysis sequence individually, for example importing diatom data, calculating a particular metric diatom from LM or NGS diatom data, or calculating EQRs from a metric and site information. These low-level functions are useful for embedding darleq3 in a longer data analysis chain using R. The package also includes "wrapper" functions, that "wrap" multiple low-level functions to perform a complete analysis with a single function call.

4.1 darleq3 wrapper functions

The most useful wrapper function is darleq. This function imports data form an Excel file, calculates multiple metrics, EQRs and WFD classes and saves the results to another Excel file in one step.

fn <- system.file("extdata/DARLEQ2TestData.xlsx", package="darleq3")
darleq(fn)

darleq will, by default, import data from the first sheet in the Excel file, and calculate TDI3, TDI4 and TDI5LM. If the output filename is not given the function will generate a name by concatenating "DARLEQ3_Results_" with the original filename, the sheet name and the current date.

To specify the sheet name, a different metric, and a output file name:

fn <- system.file("extdata/DARLEQ2TestData.xlsx", package="darleq3")
darleq(fn, sheet="Lakes LTDI Test Data", metrics="LTDI2", outFile="Results.xlsx")

To calculate and save results for multiple metrics:

fn <- system.file("extdata/DARLEQ2TestData.xlsx", package="darleq3")
darleq(fn, sheet="Lakes LTDI Test Data", metrics=c("LTDI1", "LTDI2"), outFile="Results.xlsx")

4.2 darleq3 low-level functions

darleq3 low-level functions are useful for calculating partial results or for embedding darleq3 in a longer data analysis sequence. The key functions are read_DARLEQ to import data from a DARLEQ-formatted data file (see Section 6 below for guidelines on how to format the data correctly). read_DARLEQ returns a list with two elements: diatom_data - a data frame of the diatom count or relative abundance data, and header - a data frame of sample, site and environmental data from the header of the Excel file.

fn <- system.file("extdata/DARLEQ2TestData.xlsx", package="darleq3")
d <- read_DARLEQ(fn, "Rivers TDI Test Data")
head(d$diatom_data[, 1:8])
head(d$header)

calc_Metric_EQR calculates one or more diatom metrics and the corresponding sample and site EQRS and WFD classes, and class uncertainties. The function returns a list with an element for each metric. Each element is itself a list containing sample EQRs, site EQRs and uncertainties and a job summary.

fn <- system.file("extdata/DARLEQ2TestData.xlsx", package="darleq3")
d <- read_DARLEQ(fn, "Rivers TDI Test Data")
results <- calc_Metric_EQR(d, metrics=c("TDI4", "TDI5LM"))
head(results$TDI5LM$EQR[, 9:15])
head(results$TDI5LM$Uncertainty)

save_DARLEQ saves the output from calc_Metric_EQR in an Excel file:

fn <- system.file("extdata/DARLEQ2TestData.xlsx", package="darleq3")
d <- read_DARLEQ(fn, "Rivers TDI Test Data")
results <- calc_Metric_EQR(d, metrics=c("TDI4", "TDI5LM"))
save_DARLEQ(results, outFile="Results.xlsx")

calc_Metric calculates a single metric from a data frame of diatom count or relative abundance data.

fn <- system.file("extdata/DARLEQ2TestData.xlsx", package="darleq3")
d <- read_DARLEQ(fn, "Rivers TDI Test Data")
x <- calc_Metric(d$diatom_data, metric="TDI4")
head(x$Metric)

calc_EQR calculates sample and site EQRS and WFD classes, and class uncertainties from a list of sample metrics.

fn <- system.file("extdata/DARLEQ2TestData.xlsx", package="darleq3")
d <- read_DARLEQ(fn, "Rivers TDI Test Data")
x <- calc_Metric(d$diatom_data, metric="TDI4")
eqr <- calc_EQR(x, d$header)
head(eqr$EQR[, 9:15])
head(eqr$Uncertainty)

5. Understanding darleq3 output

The DARLEQ shiny app and R functions produce output that is similar in structure and content to that produced by the DARLEQ2 program. Specifically, the shiny app and functions darleq and save_DARLEQ save data in an Excel file with the following content. For each metric, the output file will contain 3 worksheets, named Code_Job_Summary, and Code_Uncertainty (where Code is the code for each metric). These three sheets contain the following information:

5.1 Job summary

This sheet contains the input file name, worksheet name and a summary of the number of samples and taxa in the file. It also contains a list of taxa included in the file but excludes from the metric calculations either because they are planktic or because they are not included in the DARLEQ list of indicator values for that metric. The list also contains the number of occurrences (N), Hill's N2 effective number of occurrences (Hill 1973) and maximum abundance of these taxa. The list of useful in checking the data for coding errors to identify abundant taxa excluded from the metric calculations. For TDI3/4 and LTDI1/2 the output also contains a list of taxa with indicator values included in DARLEQ3 but not in DARLEQ2 software. This is useful in understanding the reasons for differences in metric scores between DARLEQ versions 2 and 3 for the same metric.

5.2 Sample_Summary

Sample Summary – this sheet contains metric, EQR and quality class results for each sample. First, the sample information listed in the original input file is repeated, and then results of the analysis are listed as follows (where CODE is the metric Code):

Percent_in_CODE: Percentages of the total count of taxa that are matched to taxa in the master taxon list and included in the metric calculations. If all taxa are matched this will be the same as the Total_count but will be less if, for example, planktic taxa are present. Comparison of these two fields will indicate if there are important taxa present in the sample but not included in the status calculations.

After the metric and classification fields a series of summary fields are listed containing the percentage of various ecological groups of diatoms:

5.3 Uncertainty

Multiple samples from each site are combined and an uncertainty analysis is performed using the mean EQR and number of samples according to Kelly et al. (2009):

6. Input data format

read_DARLEQ and the shiny app import diatom data from an Excel file in either .xls or .xlsx format. An example Excel file is included in this package (see Section 2 on how to view it). The required data and layout are rather and are slightly different for river and lake samples. Figure 2 below shows the required format for performing TDI calculations for river samples.

The first four header rows are mandatory and must contain the following information:

Note that the second column of the header information must be left blank.

Example format for river diatom samples

Identifiers for each row of the sample header information should be listed in column 1. Diatom data then follow the header information and may be in count or percentage format. The first column must contain the taxon code in either NBS or DiatCode (http://www.ecrc.ucl.ac.uk/?q=databases/diatcode) format. The codes in this column are used to link the data to the DARLEQ3 taxon list and ecological information and cannot be empty (an empty cell indicates the end of the data). The second column must include either the taxon name or code (ie. a repeat of column 1).

The remaining columns to the right of the taxon name contain diatom counts or percentages. Empty (blank) cells in the matrix will be read as zero. Character data in the diatom matrix will generate an error. A full list of diatom codes (either NBS or DiatCodes) are available in the data frame darleq3_taxa.

If the Diatom Acidification Metric (DAM) is to be calculated, the header must contain estimates of mean annual Calcium and DOC concentrations, rows named Calcium and DOC, and in ueq l-1 and mg l-1 respectively. Figure 3 shows an example formatted for calculation of TDI and DAM. Note that if only DAM scores are required the Alkalinity field may be left blank. Sample Date is not used for calculating DAM and may be left blank.

Example format for river diatom TDI and DAM samples

The required input format for lake samples is shown in Figure 4. This is exactly the same as for river data except that the fourth row must be named LAKE_TYPE and contain a code indicating lake type according to the GB lake typology alkalinity classes. Marl lakes are included in the high alkalinity (HA) group. Peat and brackish lakes are not covered by the tool. Sample date for lake samples is not used in the class calculations and can contain missing values.

Example format for lake diatom LTDI samples

7. Acknowledgements

8. References

Bennion, H., Kelly, M.G., Juggins, S., Yallop, M.L., Burgess, A., Jamieson, J., Krokowski, J., 2014. Assessment of ecological status in UK lakes using benthic diatoms. Freshwater Science 33, 639-654.

Juggins, S., Kelly, M., Allott, T., Kelly-Quinn, M., Monteith, D., 2016. A Water Framework Directive-compatible metric for assessing acidification in UK and Irish rivers using diatoms. Science of The Total Environment 568, 671-678.

Kelly, M., Bennion, H., Burgess, A., Ellis, J., Juggins, S., Guthrie, R., Jamieson, J., Adriaenssens, V., Yallop, M., 2009. Uncertainty in ecological status assessments of lakes and rivers using diatoms. Hydrobiologia 633, 5-15.

Kelly, M., Juggins, S., Guthrie, R., Pritchard, S., Jamieson, J., Rippey, B., Hirst, H., Yallop, M., 2008. Assessment of ecological status in UK rivers using diatoms. Freshwater Biology 53, 403-422.



nsj3/darleq3 documentation built on Oct. 11, 2023, 4:37 a.m.