In aquaMetrics/macroinvertebratesMetrics: Macroinvertebrate Biotic indices

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "",
  echo = FALSE
)
library(macroinvertebrateMetrics)
library(dplyr)
library(tidyr)
library(purrr)
library(tibble)
library(magrittr)
library(knitr)
library(DiagrammeR)
library(DT)

# format taxa table types
taxa <- macroinvertebrateTaxa
taxa$TAXON_NAME <- as.character(taxa$TAXON_NAME)
taxa$TL2_5_TAXON <- as.character(taxa$TL2_5_TAXON)
taxa$LAMM_TAXON <- as.character(taxa$LAMM_TAXON)
taxa$AWIC_TAXON <- as.character(taxa$AWIC_TAXON)

Intro

What taxa lists to use for identification?

According to WFD100 document, there are five standardised taxa lists. In particular, Taxa List 2 and 5 (TL2/TL5) have been widely adopted for reporting purposes. TL2 is a family-level(ish) list used for routine WHPT/RICT reporting. TL5 is used for species-level(ish) reporting including AWIC.

These taxa lists allow changes in the environment to be detected by applying consistent levels of analysis - making changes between samples directly comparable.

Data Sources

The combine taxa table for freshwater macro-invertebrates table is called INVERT-TAXON-DICTIONARY and is stored in the macroinverebrateMetrics R package. This table provides the basis for taxa lists used in analysis including WFD100 reference lists. Subsets of this table are then provided to the Labware based LIMS and ESRI's Survey123 for data entry.

The INVERT-TAXON-DICTIONARY is a copy of previous table held in NEMS (predecessor to LIMS). The R package is used when calculating metrics for reporting purposes.

Implementor's Trap

We must implement taxonomy, keys, metrics and data sharing standards that are independently produced by different organisations and a not maintained and updated in sync.

In an ideal world, we would download a globally managed list of taxa, identification keys, metrics and reporting tools which are tightly coordinated with each other and allow sharing of data.

However, there are a number of independent sources of truth. These include WFD100, National Biodiversity Network (NBN), Environmental Change Network (ECN), metrics and identification keys.

To solve this implementation dilemma, we combine all sources of truth into a taxa table (INVERT-TAXON-DICTIONARY), and co-ordinate updates to keys, data entry, training, auditing and data-sharing as required. This is repeated in similar way for other lists, for example diatoms, macrophytes and marine lists.

Summary of macroinvertebrate taxa data flow

mermaid("
graph TB
    WFD100 --> INVERT-TAXON-DICTIONARY
    ECN --> INVERT-TAXON-DICTIONARY
    NBN --> INVERT-TAXON-DICTIONARY
    Metrics --> INVERT-TAXON-DICTIONARY
    ID_KEYS -.- INVERT-TAXON-DICTIONARY
    INVERT-TAXON-DICTIONARY --> LIMS
    INVERT-TAXON-DICTIONARY --> SURVEY123
    SURVEY123 -.- LIMS
    LIMS -.- id1(SEPA Database)
    id1 -.-> ECN_DATABASE
    id1 -.-> NBN_DATABASE
    id1 --> METRICS/REPORTING
    INVERT-TAXON-DICTIONARY --> internal(LIST FOR METHOD DOCUMENTATION & AUDITORS)
    METRICS/REPORTING -.-> id1
    ", height = '100%', width = '100%')

(..>) Dotted line with arrow shows irregular usage.
(..) Dotted line with no arrow shows planned but not implemented .
(->) Solid line with arrow shows data has previously or currently flows but majority of these connections are not automated.

Metadata

The INVERT-TAXON-DICTIONARY is a single table with 49 columns. Covering taxonomy, ECN and NBN identifiers, metric scores, taxa lists, presentation and record-keeping. The columns in this table are described below:

 meta <- read.csv(system.file("extdat", "METADATA_INVERT_TAXON_DICTIONARY.csv", package="macroinvertebrateMetrics"))
DT::datatable(meta)

Requirements

By filtering on the columns in INVERT-TAXON-DICTIONARY, we can filter the table to provide taxa lists matching our requirements:

Taxa lists should limit the level of analysis to keep reporting and training consistent.
LIMS requires three taxa lists, TL2, TL5 (including LAMM) and TL5_ECN (including LAMM and ECN).
LIMS requires a single column/list of unique taxa names for the user to pick from.
Survey123 requires TL2 table with Taxon name, WHPT and BMWP scores.
Auditing at the TL5 and TL2 must be possible (even if extra taxa are included).
Lists for TL2 and TL5 must be available for external auditors for reference.
The taxa names should match the names in the preferred identification keys.
Taxa names should match with requirements of metric/modelling tools (in separate column if required).
Taxa results should be share-able with the NBN gateway.
Taxa results should be share-able with ECN.
Include additional taxa to the TL2 list which are routinely identified.
To calculate LAMM metric, additional taxa must also be added to the TL5 list.
For sites in the ECN network, additional taxa are needed to be added to TL5 to match with ECN taxa list.
Three target lists for analysis at TL2 (including extra SEPA taxa) and TL5 (including LAMM) and ECN levels are required for documentation providing a reference for analysts.

Prepare Lists

Below we work through a number of steps to a create target lists for various levels of analysis.

TL5 including LAMM

# get TL5 taxa list
tl5 <- filter(taxa, TAXON_NAME == TL5_TAXON)

# get LAMM taxa list
lamm <-  taxa[taxa$LAMM_TL == TRUE, ]

Some duplicate LAMM taxa because LAMM list includes non-standard groups/aggregates not included in WFD100 taxa lists.

# duplicates?
lamm_dups <- lamm$LAMM_TAXON[duplicated(lamm$LAMM_TAXON) ]

kable(taxa[taxa$TAXON_NAME %in% lamm$TAXON_NAME[lamm$LAMM_TAXON %in% lamm_dups]  , c("TAXON_NAME", "LAMM_TAXON")])

Combine LAMM and TL5

TL5_LAMM <- bind_rows(tl5, lamm)
TL5_LAMM <- distinct(TL5_LAMM)

TL5_ECN

What's in TL5 but not in TL_ECN

Mainly Beetle and Corixidae species

tl5 <- filter(taxa, TAXON_NAME == TL5_TAXON)
tl5_not_in_ecn <- filter(tl5, ECN_TL == FALSE)
tl5_not_in_ecn$TAXON_NAME

What's in ECN and not in TL5

Mainly Oligochaeta families and Beetle families not found in TL5

ecn <- filter(taxa, ECN_TL == TRUE)
ecn_not_in_tl5 <- filter(ecn, !TAXON_NAME %in% tl5$TAXON_NAME)
ecn_not_in_tl5$TAXON_NAME

Note on TL4

tl4 <- filter(taxa, TAXON_NAME == TL4_TAXON)
ecn <- filter(taxa, ECN_TL == TRUE)

TL4 is double the length of ECN list (r length(tl4$TAXON_NAME) Vs r length(ecn$TAXON_NAME)). TL4 is much more resource intensive than ECN list. And...yes, unexpectedly TL4 is a larger list than TL5. So in terms of number of taxa: TL1 < TL2 < TL3 < TL4 > TL5!

What to use?

What we need is TL5 + ECN? For a ECN analysis which can be 'downgraded' to TL5 and used for AWIC/LAMM or other TL5 level metrics.

However, we need to take away family level options where genus/species options are available in ECN/TL5, for instance:

Remove Oligochaeta (TL5) as option and replace with sub families from ECN list (Naididae, Lumbriculidae etc)
Add Corixa affinis (TL5) and remove Corixidae (ECN)

ECN families to keep:

ecn_family <- filter(ecn, RANK == "Family") 
tl5_genus_species <- filter(tl5, FAMILY %in% ecn$TAXON_NAME)
ecn_family_keep <- filter(ecn_family, !TAXON_NAME %in% tl5_genus_species$FAMILY)
ecn_family_keep$TAXON_NAME

TL5 families to keep

tl5_family <- filter(tl5, RANK == "Family") 
ecn_genus_species <- filter(ecn, FAMILY %in% ecn$TAXON_NAME)
tl5_family_keep <- filter(tl5_family, !TAXON_NAME %in% ecn_genus_species$FAMILY)
tl5_family_keep$TAXON_NAME

Not Active

Check column active_data_entry = FALSE. This column was used to filter which taxa could be entered in NEMS. Even if particular taxa are part of an official taxa list, there were sometimes reasons to prevent entry to use a different (usually NBN preferred / most recent name used in identification keys) named.

ECN taxa not active:

filter(ecn, ACTIVE_DATA_ENTRY == FALSE) %>% 
  select(TAXON_NAME) %>% 
  kable(row.names = FALSE)

Nemurella pictetii > Should be Nemurella picteti - with one 'i'. So use correctly spelt version. This can be converted back to 'pictetii' for ECN reporting if required.

Enchytraeidae (including Propappidae) & Lumbricidae (including Glossoscolecidae) don't have NBN codes which constrains data sharing, additionally these superfamilies were not previously recorded in NEMS. Suggest we include all separate families instead of superfamilies i.e. Enchytraeidae, Propappidae, Lumbricidae, Glossoscolecidae. When reporting for ECN we can re-combine into super families if required.

TL5 taxa not active:

filter(tl5, ACTIVE_DATA_ENTRY == FALSE) %>%  
  select(TAXON_NAME) %>% 
  kable(row.names = FALSE)

Heptagenia lateralis is not active, include Electrogena lateralis instead, as preferred by NBN and used by ECN.

Armiger crista not used but this looks like a mistake, include Armiger crista. Gyraulus (Armiger) crista is used by ECN. Armiger crista can be converted to Gyraulus (Armiger) crista if required for ECN reporting.

TL5_LAMM <- TL5_LAMM[!TL5_LAMM$TAXON_NAME %in% c("Nemurella pictetii",
                                              "Heptagenia lateralis"), ]

ecn_tl5 <- bind_rows(TL5_LAMM , ecn)
dups <- ecn_tl5[duplicated(ecn_tl5$TAXON_NAME), ]
familes_in_both <- dups[dups$RANK == "Family", ]

ecn_tl5 <- unique(ecn_tl5)

worm_families <- taxa[taxa$TAXON_NAME %in% c("Enchytraeidae", 
                                             "Propappidae", 
                                             "Lumbricidae", 
                                             "Glossoscolecidae"), ]

keep_families <- bind_rows(tl5_family_keep, ecn_family_keep, worm_families, familes_in_both)

ecn_tl5 <-  ecn_tl5[!ecn_tl5$RANK %in% c("Family",
                                          "Superfamily", 
                                          "Class"), ]

ecn_tl5 <- bind_rows(ecn_tl5, keep_families)

ecn_tl5 <- ecn_tl5[!ecn_tl5$TAXON_NAME %in% c("Planorbis crista"), ]


ecn_tl5 <- ecn_tl5[!ecn_tl5$TAXON_NAME %in% c("Tipulidae",
                                              "Limoniidae",
                                              "Pediciidae"), ]

AWIC taxa

Any AWIC taxa missing from TL5?

missing_awic <- unique(taxa$AWIC_TAXON)[!unique(taxa$AWIC_TAXON) %in% ecn_tl5$TAXON_NAME]
missing_awic[missing_awic != ""]

These missing genus are well covered at species level in TL5, so don't need to be added. Therefore the analyst should target these species rather than the genus level:

ecn_tl5$TAXON_NAME[grep("Caenis|Sialis|Protonemura", 
                        ecn_tl5$TAXON_NAME)]

Taxa Lists

Three completed lists for analysis.

TL2 + extra taxa/sub-families

For routine 'family' level lab and bankside samples.

tl2 <- taxa[!is.na(taxa$TL2_5_ORDER), ]
display_tl2 <- tl2[, c("TAXON_NAME","TL2_TAXON","BMWP", "WHPT_A", "WHPT_B", "WHPT_C", "WHPT_D")]
DT::datatable(display_tl2, rownames = FALSE, 
              extensions = 'Buttons',
              options = list(
    dom = 'Blfrtip',
    buttons = c('copy', 'csv', 'excel'),
    lengthMenu = list(c(10,30, 50, -1), 
                      c('10', '30', '50', 'All')),
    paging = T))

TL5 + LAMM

For sampling points with AWIC or LAMM, core surveillance (and perhaps other purposes).

display_tl5 <- TL5_LAMM[, c("TAXON_NAME","TL5_TAXON", "LAMM_TAXON")]
DT::datatable(display_tl5, 
              rownames = FALSE, 
              extensions = 'Buttons',
              options = list(
    dom = 'Blfrtip',
    buttons = c('copy', 'csv', 'excel'),
    lengthMenu = list(c(10,30, 50, -1), 
                      c('10', '30', '50', 'All')),
    paging = T))

TL5 + LAMM + ECN

For sampling points with ECN purpose, a new analysis in LIMS with these lists.

display_ecn_tl5 <- ecn_tl5[, c("TAXON_NAME",
                               "TL5_TAXON",
                               "LAMM_TAXON",
                               "ECN_TL")]
DT::datatable(display_ecn_tl5,  rownames = FALSE, 
              extensions = 'Buttons',
              options = list(
    dom = 'Blfrtip',
    buttons = c('copy', 'csv', 'excel'),
    lengthMenu = list(c(10,30, 50, -1), 
                      c('10', '30', '50', 'All')),
    paging = T))

aquaMetrics/macroinvertebratesMetrics documentation built on Feb. 3, 2024, 2:35 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

aquaMetrics/macroinvertebratesMetrics
Macroinvertebrate Biotic indices

In aquaMetrics/macroinvertebratesMetrics: Macroinvertebrate Biotic indices

Intro

Data Sources

Implementor's Trap

Summary of macroinvertebrate taxa data flow

Metadata

Requirements

Prepare Lists

TL5 including LAMM

TL5_ECN

Note on TL4

What to use?

Not Active

AWIC taxa

Taxa Lists

TL2 + extra taxa/sub-families

TL5 + LAMM

TL5 + LAMM + ECN

R Package Documentation

Browse R Packages

We want your feedback!

aquaMetrics/macroinvertebratesMetrics Macroinvertebrate Biotic indices

In aquaMetrics/macroinvertebratesMetrics: Macroinvertebrate Biotic indices

Intro

Data Sources

Implementor's Trap

Summary of macroinvertebrate taxa data flow

Metadata

Requirements

Prepare Lists

TL5 including LAMM

TL5_ECN

Note on TL4

What to use?

Not Active

AWIC taxa

Taxa Lists

TL2 + extra taxa/sub-families

TL5 + LAMM

TL5 + LAMM + ECN

R Package Documentation

Browse R Packages

We want your feedback!

aquaMetrics/macroinvertebratesMetrics
Macroinvertebrate Biotic indices