knitr::opts_chunk$set( collapse = TRUE, comment = "", echo = FALSE ) library(macroinvertebrateMetrics) library(dplyr) library(tidyr) library(purrr) library(tibble) library(magrittr) library(knitr) library(DiagrammeR) library(DT)
# format taxa table types taxa <- macroinvertebrateTaxa taxa$TAXON_NAME <- as.character(taxa$TAXON_NAME) taxa$TL2_5_TAXON <- as.character(taxa$TL2_5_TAXON) taxa$LAMM_TAXON <- as.character(taxa$LAMM_TAXON) taxa$AWIC_TAXON <- as.character(taxa$AWIC_TAXON)
What taxa lists to use for identification?
According to WFD100 document, there are five standardised taxa lists. In particular, Taxa List 2 and 5 (TL2/TL5) have been widely adopted for reporting purposes. TL2 is a family-level(ish) list used for routine WHPT/RICT reporting. TL5 is used for species-level(ish) reporting including AWIC.
These taxa lists allow changes in the environment to be detected by applying consistent levels of analysis - making changes between samples directly comparable.
The combine taxa table for freshwater macro-invertebrates table is called INVERT-TAXON-DICTIONARY and is stored in the macroinverebrateMetrics R package. This table provides the basis for taxa lists used in analysis including WFD100 reference lists. Subsets of this table are then provided to the Labware based LIMS and ESRI's Survey123 for data entry.
The INVERT-TAXON-DICTIONARY is a copy of previous table held in NEMS (predecessor to LIMS). The R package is used when calculating metrics for reporting purposes.
We must implement taxonomy, keys, metrics and data sharing standards that are independently produced by different organisations and a not maintained and updated in sync.
In an ideal world, we would download a globally managed list of taxa, identification keys, metrics and reporting tools which are tightly coordinated with each other and allow sharing of data.
However, there are a number of independent sources of truth. These include WFD100, National Biodiversity Network (NBN), Environmental Change Network (ECN), metrics and identification keys.
To solve this implementation dilemma, we combine all sources of truth into a taxa table (INVERT-TAXON-DICTIONARY), and co-ordinate updates to keys, data entry, training, auditing and data-sharing as required. This is repeated in similar way for other lists, for example diatoms, macrophytes and marine lists.
mermaid(" graph TB WFD100 --> INVERT-TAXON-DICTIONARY ECN --> INVERT-TAXON-DICTIONARY NBN --> INVERT-TAXON-DICTIONARY Metrics --> INVERT-TAXON-DICTIONARY ID_KEYS -.- INVERT-TAXON-DICTIONARY INVERT-TAXON-DICTIONARY --> LIMS INVERT-TAXON-DICTIONARY --> SURVEY123 SURVEY123 -.- LIMS LIMS -.- id1(SEPA Database) id1 -.-> ECN_DATABASE id1 -.-> NBN_DATABASE id1 --> METRICS/REPORTING INVERT-TAXON-DICTIONARY --> internal(LIST FOR METHOD DOCUMENTATION & AUDITORS) METRICS/REPORTING -.-> id1 ", height = '100%', width = '100%')
The INVERT-TAXON-DICTIONARY is a single table with 49 columns. Covering taxonomy, ECN and NBN identifiers, metric scores, taxa lists, presentation and record-keeping. The columns in this table are described below:
meta <- read.csv(system.file("extdat", "METADATA_INVERT_TAXON_DICTIONARY.csv", package="macroinvertebrateMetrics")) DT::datatable(meta)
By filtering on the columns in INVERT-TAXON-DICTIONARY, we can filter the table to provide taxa lists matching our requirements:
Taxa lists should limit the level of analysis to keep reporting and training consistent.
LIMS requires three taxa lists, TL2, TL5 (including LAMM) and TL5_ECN (including LAMM and ECN).
LIMS requires a single column/list of unique taxa names for the user to pick from.
Survey123 requires TL2 table with Taxon name, WHPT and BMWP scores.
Auditing at the TL5 and TL2 must be possible (even if extra taxa are included).
Lists for TL2 and TL5 must be available for external auditors for reference.
The taxa names should match the names in the preferred identification keys.
Taxa names should match with requirements of metric/modelling tools (in separate column if required).
Taxa results should be share-able with the NBN gateway.
Taxa results should be share-able with ECN.
Include additional taxa to the TL2 list which are routinely identified.
To calculate LAMM metric, additional taxa must also be added to the TL5 list.
For sites in the ECN network, additional taxa are needed to be added to TL5 to match with ECN taxa list.
Three target lists for analysis at TL2 (including extra SEPA taxa) and TL5 (including LAMM) and ECN levels are required for documentation providing a reference for analysts.
Below we work through a number of steps to a create target lists for various levels of analysis.
# get TL5 taxa list tl5 <- filter(taxa, TAXON_NAME == TL5_TAXON) # get LAMM taxa list lamm <- taxa[taxa$LAMM_TL == TRUE, ]
Some duplicate LAMM taxa because LAMM list includes non-standard groups/aggregates not included in WFD100 taxa lists.
# duplicates? lamm_dups <- lamm$LAMM_TAXON[duplicated(lamm$LAMM_TAXON) ] kable(taxa[taxa$TAXON_NAME %in% lamm$TAXON_NAME[lamm$LAMM_TAXON %in% lamm_dups] , c("TAXON_NAME", "LAMM_TAXON")])
Combine LAMM and TL5
TL5_LAMM <- bind_rows(tl5, lamm) TL5_LAMM <- distinct(TL5_LAMM)
What's in TL5 but not in TL_ECN
tl5 <- filter(taxa, TAXON_NAME == TL5_TAXON) tl5_not_in_ecn <- filter(tl5, ECN_TL == FALSE) tl5_not_in_ecn$TAXON_NAME
What's in ECN and not in TL5
ecn <- filter(taxa, ECN_TL == TRUE) ecn_not_in_tl5 <- filter(ecn, !TAXON_NAME %in% tl5$TAXON_NAME) ecn_not_in_tl5$TAXON_NAME
tl4 <- filter(taxa, TAXON_NAME == TL4_TAXON) ecn <- filter(taxa, ECN_TL == TRUE)
TL4 is double the length of ECN list (r length(tl4$TAXON_NAME)
Vs r length(ecn$TAXON_NAME)
). TL4 is much more resource intensive than ECN list. And...yes, unexpectedly TL4 is a larger list than TL5. So in terms of number of taxa: TL1 < TL2 < TL3 < TL4 > TL5!
What we need is TL5 + ECN? For a ECN analysis which can be 'downgraded' to TL5 and used for AWIC/LAMM or other TL5 level metrics.
However, we need to take away family level options where genus/species options are available in ECN/TL5, for instance:
ECN families to keep:
ecn_family <- filter(ecn, RANK == "Family") tl5_genus_species <- filter(tl5, FAMILY %in% ecn$TAXON_NAME) ecn_family_keep <- filter(ecn_family, !TAXON_NAME %in% tl5_genus_species$FAMILY) ecn_family_keep$TAXON_NAME
TL5 families to keep
tl5_family <- filter(tl5, RANK == "Family") ecn_genus_species <- filter(ecn, FAMILY %in% ecn$TAXON_NAME) tl5_family_keep <- filter(tl5_family, !TAXON_NAME %in% ecn_genus_species$FAMILY) tl5_family_keep$TAXON_NAME
Check column active_data_entry
= FALSE. This column was used to filter which taxa could be entered in NEMS. Even if particular taxa are part of an official taxa list, there were sometimes reasons to prevent entry to use a different (usually NBN preferred / most recent name used in identification keys) named.
ECN taxa not active:
filter(ecn, ACTIVE_DATA_ENTRY == FALSE) %>% select(TAXON_NAME) %>% kable(row.names = FALSE)
Nemurella pictetii > Should be Nemurella picteti - with one 'i'. So use correctly spelt version. This can be converted back to 'pictetii' for ECN reporting if required.
Enchytraeidae (including Propappidae) & Lumbricidae (including Glossoscolecidae) don't have NBN codes which constrains data sharing, additionally these superfamilies were not previously recorded in NEMS. Suggest we include all separate families instead of superfamilies i.e. Enchytraeidae, Propappidae, Lumbricidae, Glossoscolecidae. When reporting for ECN we can re-combine into super families if required.
TL5 taxa not active:
filter(tl5, ACTIVE_DATA_ENTRY == FALSE) %>% select(TAXON_NAME) %>% kable(row.names = FALSE)
Heptagenia lateralis is not active, include Electrogena lateralis instead, as preferred by NBN and used by ECN.
Armiger crista not used but this looks like a mistake, include Armiger crista. Gyraulus (Armiger) crista is used by ECN. Armiger crista can be converted to Gyraulus (Armiger) crista if required for ECN reporting.
TL5_LAMM <- TL5_LAMM[!TL5_LAMM$TAXON_NAME %in% c("Nemurella pictetii", "Heptagenia lateralis"), ] ecn_tl5 <- bind_rows(TL5_LAMM , ecn) dups <- ecn_tl5[duplicated(ecn_tl5$TAXON_NAME), ] familes_in_both <- dups[dups$RANK == "Family", ] ecn_tl5 <- unique(ecn_tl5) worm_families <- taxa[taxa$TAXON_NAME %in% c("Enchytraeidae", "Propappidae", "Lumbricidae", "Glossoscolecidae"), ] keep_families <- bind_rows(tl5_family_keep, ecn_family_keep, worm_families, familes_in_both) ecn_tl5 <- ecn_tl5[!ecn_tl5$RANK %in% c("Family", "Superfamily", "Class"), ] ecn_tl5 <- bind_rows(ecn_tl5, keep_families) ecn_tl5 <- ecn_tl5[!ecn_tl5$TAXON_NAME %in% c("Planorbis crista"), ] ecn_tl5 <- ecn_tl5[!ecn_tl5$TAXON_NAME %in% c("Tipulidae", "Limoniidae", "Pediciidae"), ]
Any AWIC taxa missing from TL5?
missing_awic <- unique(taxa$AWIC_TAXON)[!unique(taxa$AWIC_TAXON) %in% ecn_tl5$TAXON_NAME] missing_awic[missing_awic != ""]
These missing genus are well covered at species level in TL5, so don't need to be added. Therefore the analyst should target these species rather than the genus level:
ecn_tl5$TAXON_NAME[grep("Caenis|Sialis|Protonemura", ecn_tl5$TAXON_NAME)]
Three completed lists for analysis.
For routine 'family' level lab and bankside samples.
tl2 <- taxa[!is.na(taxa$TL2_5_ORDER), ] display_tl2 <- tl2[, c("TAXON_NAME","TL2_TAXON","BMWP", "WHPT_A", "WHPT_B", "WHPT_C", "WHPT_D")] DT::datatable(display_tl2, rownames = FALSE, extensions = 'Buttons', options = list( dom = 'Blfrtip', buttons = c('copy', 'csv', 'excel'), lengthMenu = list(c(10,30, 50, -1), c('10', '30', '50', 'All')), paging = T))
For sampling points with AWIC or LAMM, core surveillance (and perhaps other purposes).
display_tl5 <- TL5_LAMM[, c("TAXON_NAME","TL5_TAXON", "LAMM_TAXON")] DT::datatable(display_tl5, rownames = FALSE, extensions = 'Buttons', options = list( dom = 'Blfrtip', buttons = c('copy', 'csv', 'excel'), lengthMenu = list(c(10,30, 50, -1), c('10', '30', '50', 'All')), paging = T))
For sampling points with ECN purpose, a new analysis in LIMS with these lists.
display_ecn_tl5 <- ecn_tl5[, c("TAXON_NAME", "TL5_TAXON", "LAMM_TAXON", "ECN_TL")] DT::datatable(display_ecn_tl5, rownames = FALSE, extensions = 'Buttons', options = list( dom = 'Blfrtip', buttons = c('copy', 'csv', 'excel'), lengthMenu = list(c(10,30, 50, -1), c('10', '30', '50', 'All')), paging = T))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.