get_accepted_names: get_accepted_names

View source: R/get_accepted_names.R

get_accepted_namesR Documentation

get_accepted_names

Description

Match namelist with master and fetch the accepted names using the linkages provided within the data

Usage

get_accepted_names(
  namelist,
  master,
  gen_syn = NA,
  namelookup = NA,
  mastersource = NA,
  match_higher = FALSE,
  fuzzymatch = TRUE,
  fuzzydist = 2,
  canonical = NA,
  genus = NA,
  species = NA,
  subspecies = NA,
  prefix = "",
  verbose = TRUE
)

Arguments

namelist

data frame of the list of names to be resolved. Must contain either column canonical containing binomial or trinomial name without spp. and var. etc. or may contain columns for genus, species and subspecies (any sub-specific unit) and the names of the columns are passed as subsequent parameters.

master

data frame with required columns id, canonical and accid. Other columns like order, family are optional. Column id is typically running ids for each record and accid will contain 0 if the name is currently accepted name and id number of accepted name in case the name is a synonym. Column canonical contains binomial or trinomial without spp. var. etc.

gen_syn

data frame with columns Original_Genus and Valid_Genus where Original_genus is synonym and valid_genus is one present in the master. Default: NA when gen_syn is not used.

namelookup

Lookup data frame for names where some names might need manual lookup. The columns required are binomial and validname where binomial is new name and validname is present in the master. Default: NA when namelookup is not used.

mastersource

vector of sources to be used for assignment with priority

match_higher

match genus and family names present in canonical field

fuzzymatch

attempt fuzzy matching or not. Default: TRUE

fuzzydist

fuzzy distance while matching. Default : 2

canonical

column containing names to be resolved to accepted names , Default: NA when columns for genus and species are specified.

genus

column containing genus names to be resolved to accepted names and typically accompanied by species and subspecies columns, Default: NA when canonical parameter is supplied.

species

column containing species names to be resolved to accepted names and is accompanied by genus, Default: NA

subspecies

column containing species names to be resolved to accepted names and is accompanied by genus and species, Default: NA

prefix

to be added to all the return fields

verbose

display process messages, Default: TRUE

Details

Name resolution methods:

  • direct - was a direct match with name or a synonym

  • direct2 - was a direct match with name or a synonym in non mastersource

  • fuzzy - used fuzzy matching

  • gensyn - genus substitution with known genus level synonyms

  • lookup - Manual lookup in earlier processing

  • sppdrop - subspecies was dropped

  • sub2sp - subspecies elevated to species

  • genus - genus was matched

  • family - family was matched

  • NA - could not be resolved

Note: Make sure all the data frames have same character encoding to prevent errors.

Value

data frame containing all the original columns with following additional columns:

  • accepted_name - Accepted name present in the master. NA is not resolved

  • method - method used to resolve the name. See details for explanation of each method

See Also

Other Name functions: build_gen_syn(), cast_canonical(), cast_scientificname(), check_scientific(), expand_name(), guess_taxo_rank(), list_higher_taxo(), melt_canonical(), melt_scientificname(), resolve_names(), taxo_fuzzy_match()

Examples


master <- data.frame("id" = c(1,2,3,4,5,6,7),
                    "canonical" = c("Hypochlorosis ancharia",
                                    "Hypochlorosis tenebrosa",
                                    "Pseudonotis humboldti",
                                    "Myrina ancharia",
                                    "Hypochlorosis ancharia tenebrosa",
                                    "Hypochlorosis ancharia obiana",
                                    "Hypochlorosis lorquinii"),
                     "family" = c("Lycaenidae", "Lycaenidae", "Lycaenidae",
                                  "Lycaenidae", "Lycaenidae", "Lycaenidae",
                                  "Lycaenidae"),
                    "accid" = c(0,1,1,1,0,0,0),
                    "source" = c("itis","itis","wiki","wiki","itis",
                                 "itis","itis"),
                    stringsAsFactors = FALSE)

mylist <- data.frame("id"= c(11,12,13,14,15,16,17,18,19),
                    "scname" = c("Hypochlorosis ancharia",
                                 "Hypochlorosis ancharii",
                                 "Hypochlorosis tenebrosa",
                                 "Pseudonotis humboldtii",
                                 "Abrothrix longipilis",
                                 "Myrinana anchariana",
                                 "Hypochlorosis ancharia ancharia",
                                 "Myrina lorquinii",
                                 "Sithon lorquinii"),
                    stringsAsFactors = FALSE)

res <- get_accepted_names(namelist = mylist,
                         master=master,
                         canonical = "scname")

gen_syn_list <- data.frame("Original_Genus"=c("Pseudonotis",
                                             "Myrina"),
                          "Valid_Genus"=c("Hypochlorosis",
                                          "Hypochlorosis"),
                          stringsAsFactors = FALSE)

res <- get_accepted_names(namelist = mylist,
                         master=master,
                         gen_syn = gen_syn_list,
                         canonical = "scname")

lookup_list <- data.frame("binomial"=c("Sithon lorquinii",
                                      "Hypochlorosis humboldti"),
                         "validname"=c("Hypochlorosis lorquinii",
                                       "Hypochlorosis lorquinii"),
                         stringsAsFactors = FALSE)

res <- get_accepted_names(namelist = mylist,
                         master=master,
                         gen_syn = gen_syn_list,
                         namelookup = lookup_list,
                         canonical = "scname")

mylist_s <- melt_canonical(mylist,canonical = "scname",
                          genus = "genus",
                          species = "species",
                          subspecies = "subspecies")

res <- get_accepted_names(namelist = mylist_s,
                         master=master,
                         gen_syn = gen_syn_list,
                         namelookup = lookup_list,
                         genus = "genus",
                         species = "species",
                         subspecies = "subspecies")

res <- get_accepted_names(namelist = mylist_s,
                         master=master,
                         gen_syn = gen_syn_list,
                         namelookup = lookup_list,
                         mastersource = c("itis"),
                         genus = "genus",
                         species = "species",
                         subspecies = "subspecies")

mylist <- data.frame("id"= c(11,12,13,14,15,16,17,18),
                    "scname" = c("Hypochlorosis ancharia",
                                 "Hypochlorosis ancharii",
                                 "Hypochlorosis",
                                 "Pseudonotis",
                                 "Lycaenidae",
                                 "Pseudonotis humboldtii",
                                 "Abrothrix longipilis",
                                 "Myrinana anchariana"),
                    stringsAsFactors = FALSE)

res <- get_accepted_names(namelist = mylist,
                         master=master,
                         match_higher = TRUE,
                         canonical = "scname")


taxotools documentation built on Jan. 23, 2023, 5:24 p.m.