verify_taxa: Verify taxa that the GBIF Backbone Taxonomy does not...

View source: R/verify_taxa.R

verify_taxaR Documentation

Verify taxa that the GBIF Backbone Taxonomy does not recognize or will lump

Description

Verify taxa that the GBIF Backbone Taxonomy does not recognize (no backbone match) or will lump under another name (synonyms). This is done by adding a verificationKey to the input dataframe, populated with:

  • For ACCEPTED and DOUBTFUL taxa: the backbone taxon key for that taxon (taxon is its own unit and won't be lumped).

  • For other taxa: a manually chosen and thus verified backbone taxon key. This could either be the taxon key of:

    • accepted taxon suggested by GBIF: backbone synonymy is accepted and taxon will be lumped.

    • another accepted taxon: backbone synonymy is rejected, but taxon will be lumped under another name.

    • taxon itself: backbone synonymy is rejected, taxon will be considered as separate taxon.

    • other taxon/taxa: automatic backbone match failed, but taxon can be considered/lumped with manually found taxon/taxa (e.g. hybrid formula considered equal to its hybrid parents).

The manually chosen verificationKey should be provided in verification: a dataframe (probably read from a file) listing all checklist taxon/backbone taxon/accepted taxon combinations that require verification. The function will update a provided verification based on the input taxa or create a new one if none is provided. Any changes to the verification are also provided as ancillary information.

Usage

verify_taxa(
  taxa,
  verification = NULL,
  taxonKey = "taxonKey",
  scientificName = "scientificName",
  datasetKey = "datasetKey",
  bb_key = "bb_key",
  bb_scientificName = "bb_scientificName",
  bb_kingdom = "bb_kingdom",
  bb_rank = "bb_rank",
  bb_taxonomicStatus = "bb_taxonomicStatus",
  bb_acceptedKey = "bb_acceptedKey",
  bb_acceptedName = "bb_acceptedName",
  verification_taxonKey = "taxonKey",
  verification_scientificName = "scientificName",
  verification_datasetKey = "datasetKey",
  verification_bb_key = "bb_key",
  verification_bb_scientificName = "bb_scientificName",
  verification_bb_kingdom = "bb_kingdom",
  verification_bb_rank = "bb_rank",
  verification_bb_taxonomicStatus = "bb_taxonomicStatus",
  verification_bb_acceptedKey = "bb_acceptedKey",
  verification_bb_acceptedName = "bb_acceptedName",
  verification_bb_acceptedKingdom = "bb_acceptedKingdom",
  verification_bb_acceptedRank = "bb_acceptedRank",
  verification_bb_acceptedTaxonomicStatus = "bb_acceptedTaxonomicStatus",
  verification_verificationKey = "verificationKey",
  verification_remarks = "remarks",
  verification_verifiedBy = "verifiedBy",
  verification_dateAdded = "dateAdded",
  verification_outdated = "outdated"
)

Arguments

taxa

df. Dataframe with at least the following (default) columns for each taxon:

  • taxonKey: numeric. Non-backbone checklist taxon key assigned by GBIF.

  • scientificName: character. Scientific name as interpreted by GBIF.

  • datasetKey: character. Dataset key (UUID) assigned by GBIF of originating checklist.

  • bb_key: numeric. Taxon key of matching backbone taxon (if any).

  • bb_scientificName: character. Scientific name of matching backbone taxon.

  • bb_kingdom: character. Kingdom of matching backbone taxon.

  • bb_rank: character. Rank of matching backbone taxon.

  • bb_taxonomicStatus: character. Taxonomic status of matching backbone taxon.

  • bb_acceptedKey: numeric. Accepted key of taxon for which matching backbone taxon is considered a synonym.

  • bb_acceptedName: character. Accepted name of taxon for which matching backbone taxon is considered a synonym.

verification

df. Dataframe with at least the following columns for each checklist taxon/backbone taxon/accepted taxon combination:

  • taxonKey: numeric. Non-backbone checklist taxon key assigned by GBIF.

  • scientificName: character. Scientific name as interpreted by GBIF.

  • datasetKey: character. Dataset key (UUID) assigned by GBIF of originating checklist.

  • bb_key: numeric. Taxon key of matching backbone taxon (if any).

  • bb_scientificName: character. Scientific name of matching backbone taxon.

  • bb_kingdom: character. Kingdom of matching backbone taxon.

  • bb_rank: character. Rank of matching backbone taxon.

  • bb_taxonomicStatus: character. Taxonomic status of matching backbone taxon.

  • bb_acceptedKey: numeric. Taxon key of accepted backbone taxon in case matching backbone taxon is considered a synonym.

  • bb_acceptedName: character. Scientific name of accepted backbone taxon in case matching backbone taxon is considered a synonym.

  • bb_acceptedKingdom: character. Kingdom of accepted taxon. Expected to be equal to bb_kingdom.

  • bb_acceptedRank: character. Rank of accepted taxon.

  • bb_acceptedTaxonomicStatus: character. Taxonomic status of accepted taxon. Expected to be ACCEPTED.

  • verificationKey: character. Taxon key(s) of backbone taxon manually set by expert.

  • remarks: character. Remarks provided by the expert.

  • verifiedBy: character. Name of the person who assigned verificationKey.

  • dateAdded: date. Date on which new combinations were added.

  • outdated: logical. TRUE when combination was not used for input taxa.

taxonKey, scientificName, datasetKey, bb_key, bb_scientificName, bb_kingdom, bb_rank, bb_taxonomicStatus, bb_acceptedKey, bb_acceptedName

Column names of required columns of taxa. They have to be passed as strings, e.g. "taxon_keys". Default: column names as specified above in taxa.

verification_taxonKey, verification_scientificName, verification_datasetKey, verification_bb_key, verification_bb_scientificName, verification_bb_kingdom, verification_bb_rank, verification_bb_taxonomicStatus, verification_bb_acceptedKey, verification_bb_acceptedName, verification_bb_acceptedKingdom, verification_bb_acceptedRank, verification_bb_acceptedTaxonomicStatus, verification_verificationKey, verification_remarks, verification_verifiedBy, verification_dateAdded, verification_outdated

Column names of required columns of verification. They have to be passed as strings, e.g. "verification_taxon_keys". Default: column names as specified above in verification.

Value

list. List with three objects:

  • taxa: df. Provided dataframe with additional column verificationKey.

  • verification: df. New or updated dataframe with verification information.

  • info: list. Dataframes with ancillary information regarding changes to the verification.

    • new_synonyms: df. Subset of verification with synonym taxa found in taxa but not in provided verification).

    • new_unmatched_taxa: df. Subset of verification with unmatched taxa found in taxa but not in provided verification).

    • outdated_synonyms: df. Subset of verification with synonyms found in provided verification but not in taxa.

    • outdated_unmatched_taxa: df. Subset of verification with unmatched taxa found in provided verification but not in taxa.

    • updated_bb_scientificName: df. bb_scientificNames in provided verification that were updated updated_bb_scientificName in the backbone since.

    • updated_bb_acceptedName: df. bb_acceptedNames in provided verification that were updated updated_bb_acceptedName in the backbone since.

    • duplicates: df. Taxa present in more than one checklist.

    • check_verificationKey: df. Check if provided verificationKeys can be found in backbone.

Examples

## Not run: 
my_taxa <- data.frame(
  taxonKey = c(
    141117238,
    113794952,
    141264857,
    100480872,
    141264614,
    100220432,
    141264835,
    140563014,
    140562956,
    145953989,
    148437916,
    114445583,
    141264849,
    101790530
  ),
  scientificName = c(
    "Aspius aspius",
    "Rana catesbeiana",
    "Polystichum tsus-simense J.Smith",
    "Apus apus (Linnaeus, 1758)",
    "Begonia x semperflorens hort.",
    "Rana catesbeiana",
    "Spiranthes cernua (L.) Richard x S. odorata (Nuttall) Lindley",
    "Atyaephyra desmaresti",
    "Ferrissia fragilis",
    "Ferrissia fragilis",
    "Ferrissia fragilis",
    "Rana blanfordii Boulenger",
    "Pterocarya x rhederiana C.K. Schneider",
    "Stenelmis williami Schmude"
  ),
  datasetKey = c(
    "98940a79-2bf1-46e6-afd6-ba2e85a26f9f",
    "e4746398-f7c4-47a1-a474-ae80a4f18e92",
    "9ff7d317-609b-4c08-bd86-3bc404b77c42",
    "39653f3e-8d6b-4a94-a202-859359c164c5",
    "9ff7d317-609b-4c08-bd86-3bc404b77c42",
    "b351a324-77c4-41c9-a909-f30f77268bc4",
    "9ff7d317-609b-4c08-bd86-3bc404b77c42",
    "289244ee-e1c1-49aa-b2d7-d379391ce265",
    "289244ee-e1c1-49aa-b2d7-d379391ce265",
    "3f5e930b-52a5-461d-87ec-26ecd66f14a3",
    "1f3505cd-5d98-4e23-bd3b-ffe59d05d7c2",
    "3772da2f-daa1-4f07-a438-15a881a2142c",
    "9ff7d317-609b-4c08-bd86-3bc404b77c42",
    "9ca92552-f23a-41a8-a140-01abaa31c931"
  ),
  bb_key = c(
    2360181,
    2427092,
    2651108,
    5228676,
    NA,
    2427092,
    NA,
    4309705,
    2291152,
    2291152,
    2291152,
    2430304,
    NA,
    1033588
  ),
  bb_scientificName = c(
    "Aspius aspius (Linnaeus, 1758)",
    "Rana catesbeiana Shaw, 1802",
    "Polystichum tsus-simense (Hook.) J.Sm.",
    "Apus apus (Linnaeus, 1758)",
    NA,
    "Rana catesbeiana Shaw, 1802",
    NA,
    "Atyaephyra desmarestii (Millet, 1831)",
    "Ferrissia fragilis (Tryon, 1863)",
    "Ferrissia fragilis (Tryon, 1863)",
    "Ferrissia fragilis (Tryon, 1863)",
    "Rana blanfordii Boulenger, 1882",
    NA,
    "Stenelmis williami Schmude"
  ),
  bb_kingdom = c(
    "Animalia",
    "Animalia",
    "Plantae",
    "Animalia",
    NA,
    "Animalia",
    NA,
    "Animalia",
    "Animalia",
    "Animalia",
    "Animalia",
    "Animalia",
    NA,
    "Animalia"
  ),
  bb_rank = c(
    "SPECIES",
    "SPECIES",
    "SPECIES",
    "SPECIES",
    NA,
    "SPECIES",
    NA,
    "SPECIES",
    "SPECIES",
    "SPECIES",
    "SPECIES",
    "SPECIES",
    NA,
    "SPECIES"
  ),
  bb_taxonomicStatus = c(
    "SYNONYM",
    "SYNONYM",
    "SYNONYM",
    "ACCEPTED",
    NA,
    "SYNONYM",
    NA,
    "HOMOTYPIC_SYNONYM",
    "SYNONYM",
    "SYNONYM",
    "SYNONYM",
    "SYNONYM",
    NA,
    "SYNONYM"
  ),
  bb_acceptedKey = c(
    5851603,
    2427091,
    4046493,
    NA,
    NA,
    2427091,
    NA,
    6454754,
    9520065,
    9520065,
    9520065,
    2430301,
    NA,
    1033553
  ),
  bb_acceptedName = c(
    "Leuciscus aspius (Linnaeus, 1758)",
    "Lithobates catesbeianus (Shaw, 1802)",
    "Polystichum luctuosum (Kunze) Moore.",
    NA,
    NA,
    "Lithobates catesbeianus (Shaw, 1802)",
    NA,
    "Hippolyte desmarestii Millet, 1831",
    "Ferrissia californica (Rowell, 1863)",
    "Ferrissia californica (Rowell, 1863)",
    "Ferrissia californica (Rowell, 1863)",
    "Nanorana blanfordii (Boulenger, 1882)",
    NA,
    "Stenelmis Dufour, 1835"
  ),
  taxonID = c(
    "alien-fishes-checklist:taxon:c937610f85ea8a74f105724c8f198049",
    "88",
    "alien-plants-belgium:taxon:57c1d111f14fd5f3271b0da53c05c745",
    "4512",
    "alien-plants-belgium:taxon:9a6c5ed8907ff169433fe44fcbff0705",
    "80-syn",
    "alien-plants-belgium:taxon:29409d1e1adc88d6357dd0be13350d6c",
    "alien-macroinvertebrates-checklist:taxon:54cca150e1e0b7c0b3f5b152ae64d62b",
    "alien-macroinvertebrates-checklist:taxon:73f271d93128a4e566e841ea6e3abff0",
    "rinse-checklist:taxon:7afe7b1fbdd06cbdfe97272567825c09",
    "ad-hoc-checklist:taxon:32dc2e18733fffa92ba4e1b35d03c4e2",
    "a80caa33-da9d-48ed-80e3-f76b0b3810f9",
    "alien-plants-belgium:taxon:56d6564f59d9092401c454849213366f",
    "193729"
  ),
  stringsAsFactors = FALSE
)

my_verification <- data.frame(
  taxonKey = c(
    113794952,
    141264857,
    143920280,
    141264835,
    141264614,
    140562956,
    145953989,
    114445583,
    128897752,
    101790530,
    141265523
  ),
  scientificName = c(
    "Rana catesbeiana",
    "Polystichum tsus-simense J.Smith",
    "Lemnaceae",
    "Spiranthes cernua (L.) Richard x S. odorata (Nuttall) Lindley",
    "Begonia x semperflorens hort.",
    "Ferrissia fragilis",
    "Ferrissia fragilis",
    "Rana blanfordii Boulenger",
    "Python reticulatus Fitzinger, 1826",
    "Stenelmis williami Schmude",
    "Veronica austriaca Jacq."
  ),
  datasetKey = c(
    "e4746398-f7c4-47a1-a474-ae80a4f18e92",
    "9ff7d317-609b-4c08-bd86-3bc404b77c42",
    "e4746398-f7c4-47a1-a474-ae80a4f18e92",
    "9ff7d317-609b-4c08-bd86-3bc404b77c42",
    "9ff7d317-609b-4c08-bd86-3bc404b77c42",
    "289244ee-e1c1-49aa-b2d7-d379391ce265",
    "3f5e930b-52a5-461d-87ec-26ecd66f14a3",
    "3772da2f-daa1-4f07-a438-15a881a2142c",
    "7ddf754f-d193-4cc9-b351-99906754a03b",
    "9ca92552-f23a-41a8-a140-01abaa31c931",
    "9ff7d317-609b-4c08-bd86-3bc404b77c42"
  ),
  bb_key = c(
    2427092,
    2651108,
    6723,
    NA,
    NA,
    2291152,
    2291152,
    2430304,
    7587934,
    1033588,
    NA
  ),
  bb_scientificName = c(
    "Rana catesbeiana Shaw, 1802",
    "Polystichum tsus-tsus-tsus (Hook.) Captain",
    "Lemnaceae",
    NA,
    NA,
    "Ferrissia fragilis (Tryon, 1863)",
    "Ferrissia fragilis (Tryon, 1863)",
    "Rana blanfordii Boulenger, 1882",
    "Python reticulatus Fitzinger, 1826",
    "Stenelmis williami Schmude",
    NA
  ),
  bb_kingdom = c(
    "Animalia",
    "Plantae",
    "Plantae",
    NA,
    NA,
    "Animalia",
    "Animalia",
    "Animalia",
    "Animalia",
    "Animalia",
    NA
  ),
  bb_rank = c(
    "SPECIES",
    "SPECIES",
    "FAMILY",
    NA,
    NA,
    "SPECIES",
    "SPECIES",
    "SPECIES",
    "SPECIES",
    "SPECIES",
    NA
  ),
  bb_taxonomicStatus = c(
    "SYNONYM",
    "SYNONYM",
    "SYNONYM",
    NA,
    NA,
    "SYNONYM",
    "SYNONYM",
    "SYNONYM",
    "SYNONYM",
    "SYNONYM",
    NA
  ),
  bb_acceptedKey = c(
    2427091,
    4046493,
    6979,
    NA,
    NA,
    9520065,
    9520065,
    2427008,
    9260388,
    1033553,
    NA
  ),
  bb_acceptedName = c(
    "Lithobates dummyus (Batman, 2018)",
    "Polystichum luctuosum (Kunze) Moore.",
    "Araceae",
    NA,
    NA,
    "Ferrissia californica (Rowell, 1863)",
    "Ferrissia californica (Rowell, 1863)",
    "Hylarana chalconota (Schlegel, 1837)",
    "Malayopython reticulatus (Schneider, 1801)",
    "Stenelmis Dufour, 1835",
    NA
  ),
  bb_acceptedKingdom = c(
    "Animalia",
    "Plantae",
    "Plantae",
    NA,
    NA,
    "Animalia",
    "Animalia",
    "Animalia",
    "Animalia",
    "Animalia",
    NA
  ),
  bb_acceptedRank = c(
    "SPECIES",
    "SPECIES",
    "FAMILY",
    NA,
    NA,
    "SPECIES",
    "SPECIES",
    "SPECIES",
    "SPECIES",
    "GENUS",
    NA
  ),
  bb_acceptedTaxonomicStatus = c(
    "ACCEPTED",
    "ACCEPTED",
    "ACCEPTED",
    NA,
    NA,
    "ACCEPTED",
    "ACCEPTED",
    "ACCEPTED",
    "ACCEPTED",
    "ACCEPTED",
    NA
  ),
  verificationKey = c(
    2427091,
    4046493,
    6979,
    "2805420,2805363",
    NA,
    NA,
    NA,
    NA,
    9260388,
    NA,
    3172099
  ),
  remarks = c(
    "dummy example 1: bb_acceptedName should be updated.",
    "dummy example 2: bb_scientificName should be updated.",
    "dummy example 3: not used anymore. Set outdated = TRUE.",
    "dummy example 4: multiple keys in verificationKey are allowed.",
    "dummy example 5: nothing should happen.",
    "dummy example 6: datasetKey should not be modified. If new taxa come in
    with same name from other checklsits, they should be added as new rows.
    Report them as duplicates in duplicates_taxa",
    "dummy example 7: datasetKey should not be modified. If new taxa come in
    with same name from other checklsits, they should be added as new rows.
    Report them as duplicates in duplicates_taxa",
    "dummy example 8: outdated synonym. Set outdated = TRUE.",
    "dummy example 9: outdated synonym. outdated is already TRUE. No actions.",
    "dummy example 10: outdated synonym. Not outdated anymore. Change outdated
    back to FALSE.",
    "dummy example 11: outdated unmatched taxa. Set outdated = TRUE."
  ),
  verifiedBy = c(
    "Damiano Oldoni",
    "Peter Desmet",
    "Stijn Van Hoey",
    "Tanja Milotic",
    NA,
    NA,
    NA,
    NA,
    "Lien Reyserhove",
    NA,
    "Dimitri Brosens"
  ),
  dateAdded = as.Date(
    c(
      "2018-07-01",
      "2018-07-01",
      "2018-07-01",
      "2018-07-16",
      "2018-07-16",
      "2018-07-01",
      "2018-11-20",
      "2018-11-29",
      "2018-12-01",
      "2018-12-02",
      "2018-12-03"
    )
  ),
  outdated = c(
    FALSE,
    FALSE,
    FALSE,
    FALSE,
    FALSE,
    FALSE,
    FALSE,
    FALSE,
    TRUE,
    TRUE,
    FALSE
  ),
  stringsAsFactors = FALSE
)

# output
verify_taxa(taxa = my_taxa, verification = my_verification)
verify_taxa(taxa = my_taxa)

# you can also provide your own column names for one or more required columns:
library(dplyr)
my_taxa_other_colnames <-
  rename(
    my_taxa,
    checklist = datasetKey,
    scientific_names = scientificName
  )

my_verification_other_colnames <-
  rename(
    my_verification,
    backbone_scientific_names = bb_scientificName,
    backbone_accepted_names = bb_acceptedName,
    is_outdated = outdated,
    author_verification = verifiedBy
  )

# output
verify_taxa(
  taxa = my_taxa_other_colnames,
  verification = my_verification_other_colnames
)

## End(Not run)

trias-project/trias documentation built on Sept. 18, 2024, 11:50 a.m.