knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "",
  echo = FALSE
)
library(macroinvertebrateMetrics)
library(dplyr)
library(tidyr)
library(purrr)
library(tibble)
library(magrittr)
library(knitr)
library(DiagrammeR)
library(DT)
# format taxa table types
taxa <- macroinvertebrateTaxa
taxa$TAXON_NAME <- as.character(taxa$TAXON_NAME)
taxa$TL2_5_TAXON <- as.character(taxa$TL2_5_TAXON)
taxa$LAMM_TAXON <- as.character(taxa$LAMM_TAXON)
taxa$AWIC_TAXON <- as.character(taxa$AWIC_TAXON)

Intro

What taxa lists to use for identification?

According to WFD100 document, there are five standardised taxa lists. In particular, Taxa List 2 and 5 (TL2/TL5) have been widely adopted for reporting purposes. TL2 is a family-level(ish) list used for routine WHPT/RICT reporting. TL5 is used for species-level(ish) reporting including AWIC.

These taxa lists allow changes in the environment to be detected by applying consistent levels of analysis - making changes between samples directly comparable.

Data Sources

The combine taxa table for freshwater macro-invertebrates table is called INVERT-TAXON-DICTIONARY and is stored in the macroinverebrateMetrics R package. This table provides the basis for taxa lists used in analysis including WFD100 reference lists. Subsets of this table are then provided to the Labware based LIMS and ESRI's Survey123 for data entry.

The INVERT-TAXON-DICTIONARY is a copy of previous table held in NEMS (predecessor to LIMS). The R package is used when calculating metrics for reporting purposes.

Implementor's Trap

We must implement taxonomy, keys, metrics and data sharing standards that are independently produced by different organisations and a not maintained and updated in sync.

In an ideal world, we would download a globally managed list of taxa, identification keys, metrics and reporting tools which are tightly coordinated with each other and allow sharing of data.

However, there are a number of independent sources of truth. These include WFD100, National Biodiversity Network (NBN), Environmental Change Network (ECN), metrics and identification keys.

To solve this implementation dilemma, we combine all sources of truth into a taxa table (INVERT-TAXON-DICTIONARY), and co-ordinate updates to keys, data entry, training, auditing and data-sharing as required. This is repeated in similar way for other lists, for example diatoms, macrophytes and marine lists.

Summary of macroinvertebrate taxa data flow

mermaid("
graph TB
    WFD100 --> INVERT-TAXON-DICTIONARY
    ECN --> INVERT-TAXON-DICTIONARY
    NBN --> INVERT-TAXON-DICTIONARY
    Metrics --> INVERT-TAXON-DICTIONARY
    ID_KEYS -.- INVERT-TAXON-DICTIONARY
    INVERT-TAXON-DICTIONARY --> LIMS
    INVERT-TAXON-DICTIONARY --> SURVEY123
    SURVEY123 -.- LIMS
    LIMS -.- id1(SEPA Database)
    id1 -.-> ECN_DATABASE
    id1 -.-> NBN_DATABASE
    id1 --> METRICS/REPORTING
    INVERT-TAXON-DICTIONARY --> internal(LIST FOR METHOD DOCUMENTATION & AUDITORS)
    METRICS/REPORTING -.-> id1
    ", height = '100%', width = '100%')

Metadata

The INVERT-TAXON-DICTIONARY is a single table with 49 columns. Covering taxonomy, ECN and NBN identifiers, metric scores, taxa lists, presentation and record-keeping. The columns in this table are described below:

 meta <- read.csv(system.file("extdat", "METADATA_INVERT_TAXON_DICTIONARY.csv", package="macroinvertebrateMetrics"))
DT::datatable(meta)

Requirements

By filtering on the columns in INVERT-TAXON-DICTIONARY, we can filter the table to provide taxa lists matching our requirements:

Prepare Lists

Below we work through a number of steps to a create target lists for various levels of analysis.

TL5 including LAMM

# get TL5 taxa list
tl5 <- filter(taxa, TAXON_NAME == TL5_TAXON)

# get LAMM taxa list
lamm <-  taxa[taxa$LAMM_TL == TRUE, ]

Some duplicate LAMM taxa because LAMM list includes non-standard groups/aggregates not included in WFD100 taxa lists.

# duplicates?
lamm_dups <- lamm$LAMM_TAXON[duplicated(lamm$LAMM_TAXON) ]

kable(taxa[taxa$TAXON_NAME %in% lamm$TAXON_NAME[lamm$LAMM_TAXON %in% lamm_dups]  , c("TAXON_NAME", "LAMM_TAXON")])

Combine LAMM and TL5

TL5_LAMM <- bind_rows(tl5, lamm)
TL5_LAMM <- distinct(TL5_LAMM)

TL5_ECN

What's in TL5 but not in TL_ECN

tl5 <- filter(taxa, TAXON_NAME == TL5_TAXON)
tl5_not_in_ecn <- filter(tl5, ECN_TL == FALSE)
tl5_not_in_ecn$TAXON_NAME

What's in ECN and not in TL5

ecn <- filter(taxa, ECN_TL == TRUE)
ecn_not_in_tl5 <- filter(ecn, !TAXON_NAME %in% tl5$TAXON_NAME)
ecn_not_in_tl5$TAXON_NAME

Note on TL4

tl4 <- filter(taxa, TAXON_NAME == TL4_TAXON)
ecn <- filter(taxa, ECN_TL == TRUE)

TL4 is double the length of ECN list (r length(tl4$TAXON_NAME) Vs r length(ecn$TAXON_NAME)). TL4 is much more resource intensive than ECN list. And...yes, unexpectedly TL4 is a larger list than TL5. So in terms of number of taxa: TL1 < TL2 < TL3 < TL4 > TL5!

What to use?

What we need is TL5 + ECN? For a ECN analysis which can be 'downgraded' to TL5 and used for AWIC/LAMM or other TL5 level metrics.

However, we need to take away family level options where genus/species options are available in ECN/TL5, for instance:

ECN families to keep:

ecn_family <- filter(ecn, RANK == "Family") 
tl5_genus_species <- filter(tl5, FAMILY %in% ecn$TAXON_NAME)
ecn_family_keep <- filter(ecn_family, !TAXON_NAME %in% tl5_genus_species$FAMILY)
ecn_family_keep$TAXON_NAME

TL5 families to keep

tl5_family <- filter(tl5, RANK == "Family") 
ecn_genus_species <- filter(ecn, FAMILY %in% ecn$TAXON_NAME)
tl5_family_keep <- filter(tl5_family, !TAXON_NAME %in% ecn_genus_species$FAMILY)
tl5_family_keep$TAXON_NAME

Not Active

Check column active_data_entry = FALSE. This column was used to filter which taxa could be entered in NEMS. Even if particular taxa are part of an official taxa list, there were sometimes reasons to prevent entry to use a different (usually NBN preferred / most recent name used in identification keys) named.

ECN taxa not active:

filter(ecn, ACTIVE_DATA_ENTRY == FALSE) %>% 
  select(TAXON_NAME) %>% 
  kable(row.names = FALSE)

Nemurella pictetii > Should be Nemurella picteti - with one 'i'. So use correctly spelt version. This can be converted back to 'pictetii' for ECN reporting if required.

Enchytraeidae (including Propappidae) & Lumbricidae (including Glossoscolecidae) don't have NBN codes which constrains data sharing, additionally these superfamilies were not previously recorded in NEMS. Suggest we include all separate families instead of superfamilies i.e. Enchytraeidae, Propappidae, Lumbricidae, Glossoscolecidae. When reporting for ECN we can re-combine into super families if required.

TL5 taxa not active:

filter(tl5, ACTIVE_DATA_ENTRY == FALSE) %>%  
  select(TAXON_NAME) %>% 
  kable(row.names = FALSE)

Heptagenia lateralis is not active, include Electrogena lateralis instead, as preferred by NBN and used by ECN.

Armiger crista not used but this looks like a mistake, include Armiger crista. Gyraulus (Armiger) crista is used by ECN. Armiger crista can be converted to Gyraulus (Armiger) crista if required for ECN reporting.

TL5_LAMM <- TL5_LAMM[!TL5_LAMM$TAXON_NAME %in% c("Nemurella pictetii",
                                              "Heptagenia lateralis"), ]

ecn_tl5 <- bind_rows(TL5_LAMM , ecn)
dups <- ecn_tl5[duplicated(ecn_tl5$TAXON_NAME), ]
familes_in_both <- dups[dups$RANK == "Family", ]

ecn_tl5 <- unique(ecn_tl5)

worm_families <- taxa[taxa$TAXON_NAME %in% c("Enchytraeidae", 
                                             "Propappidae", 
                                             "Lumbricidae", 
                                             "Glossoscolecidae"), ]

keep_families <- bind_rows(tl5_family_keep, ecn_family_keep, worm_families, familes_in_both)

ecn_tl5 <-  ecn_tl5[!ecn_tl5$RANK %in% c("Family",
                                          "Superfamily", 
                                          "Class"), ]

ecn_tl5 <- bind_rows(ecn_tl5, keep_families)

ecn_tl5 <- ecn_tl5[!ecn_tl5$TAXON_NAME %in% c("Planorbis crista"), ]


ecn_tl5 <- ecn_tl5[!ecn_tl5$TAXON_NAME %in% c("Tipulidae",
                                              "Limoniidae",
                                              "Pediciidae"), ]

AWIC taxa

Any AWIC taxa missing from TL5?

missing_awic <- unique(taxa$AWIC_TAXON)[!unique(taxa$AWIC_TAXON) %in% ecn_tl5$TAXON_NAME]
missing_awic[missing_awic != ""]

These missing genus are well covered at species level in TL5, so don't need to be added. Therefore the analyst should target these species rather than the genus level:

ecn_tl5$TAXON_NAME[grep("Caenis|Sialis|Protonemura", 
                        ecn_tl5$TAXON_NAME)]

Taxa Lists

Three completed lists for analysis.

TL2 + extra taxa/sub-families

For routine 'family' level lab and bankside samples.

tl2 <- taxa[!is.na(taxa$TL2_5_ORDER), ]
display_tl2 <- tl2[, c("TAXON_NAME","TL2_TAXON","BMWP", "WHPT_A", "WHPT_B", "WHPT_C", "WHPT_D")]
DT::datatable(display_tl2, rownames = FALSE, 
              extensions = 'Buttons',
              options = list(
    dom = 'Blfrtip',
    buttons = c('copy', 'csv', 'excel'),
    lengthMenu = list(c(10,30, 50, -1), 
                      c('10', '30', '50', 'All')),
    paging = T))

TL5 + LAMM

For sampling points with AWIC or LAMM, core surveillance (and perhaps other purposes).

display_tl5 <- TL5_LAMM[, c("TAXON_NAME","TL5_TAXON", "LAMM_TAXON")]
DT::datatable(display_tl5, 
              rownames = FALSE, 
              extensions = 'Buttons',
              options = list(
    dom = 'Blfrtip',
    buttons = c('copy', 'csv', 'excel'),
    lengthMenu = list(c(10,30, 50, -1), 
                      c('10', '30', '50', 'All')),
    paging = T))

TL5 + LAMM + ECN

For sampling points with ECN purpose, a new analysis in LIMS with these lists.

display_ecn_tl5 <- ecn_tl5[, c("TAXON_NAME",
                               "TL5_TAXON",
                               "LAMM_TAXON",
                               "ECN_TL")]
DT::datatable(display_ecn_tl5,  rownames = FALSE, 
              extensions = 'Buttons',
              options = list(
    dom = 'Blfrtip',
    buttons = c('copy', 'csv', 'excel'),
    lengthMenu = list(c(10,30, 50, -1), 
                      c('10', '30', '50', 'All')),
    paging = T))


aquaMetrics/macroinvertebratesMetrics documentation built on Feb. 3, 2024, 2:35 a.m.