validateTax | R Documentation |
This function assigns different categories of confidence level (i.e. high, medium, low or unknown) to the identification of species records, based on the name of the person who provided the species identification and on type specimens.
validateTax( x, col.names = c(family = "family.new", det.name = "identifiedBy.new", col.name = "recordedBy.new", types = "typeStatus", rec.ID = "numTombo", rec.type = "basisOfRecord"), special.collector = TRUE, generalist = FALSE, generalist.class = "medium", other.records = NULL, miss.taxonomist = NULL, taxonomist.list = "plantR", voucher.list = NULL, noName = c("semdeterminador", "anonymus", "anonymous", "anonimo", "incognito", "unknown", "s.d.", "s.n."), top.det = 10, print = TRUE )
x |
a data frame with the species records. |
col.names |
vector. A named vector containing the names of columns in the input data frame for each of the information needed to assign confidence levels to species identifications. Default to the plantR output column names. |
special.collector |
Logical. Specimens collected by the family specialist but with empty determiner field, should be classified as high confidence level? Default to TRUE. |
generalist |
Logical. Should family generalists be considered for taxonomic validation? Default to FALSE. |
generalist.class |
Character. Confidence level to be assigned to family generalists. Default to "medium". |
other.records |
Character or Integer. The Confidence level (if character) or the number of downgrading steps to be assigned to records which are not preserved specimens. Default to NULL (all record types are treated the same). |
miss.taxonomist |
Vector. Any missing combination of family x taxonomist that should be added to the validation? |
taxonomist.list |
a data.frame containing the list of taxonomist names.
The default is "plantR", the internal |
voucher.list |
Vector. One or more unique record identifiers (i.e. combination of collection code and number) that should be flagged with a high confidence level? Default to NULL. |
noName |
Vector. One or more characters (in lower cases) with the standard notation for missing data in the field 'det.name'. Default to some typical notation found in herbarium data. |
top.det |
Numerical. How many of the top missing identifiers should be printed? Default to 10. |
print |
logical. Should the table of missing identifiers be printed? Default to TRUE. |
rec.ID |
Character. The name of the columns containing the unique record
identifier (see function |
The input data frame x
must contain at least the columns with the
information on the record family and the name of the person that provided the
species identification. Preferably, this data frame should also contain
information on type specimens and collectors names. If the user provide a
list of records to be flagged as having a high confidence level in the
identification, the user must also provide the column where the record unique
identifiers are stored. The names of these columns should be provided as a
named vector to the argument col.names
, as follows:
'family': the botanical family (default: 'family.new')
'det.name': the identifier name (default: 'identifiedBy.new')
'col.name': the collector name (default: 'recordedBy.new')
'types': type specimens (default: 'typeStatus')
'rec.ID': the collector serial number (default: 'numTombo')
'rec.type': the type of record (default: 'basisOfRecord')
As for other functions in plantR, using a data frame x
that has
already passed by the editing steps of the plantR workflow should result
in more accurate outputs.
The function classifies as high confidence level all records whose species identifications were performed by a family specialist or any type specimens (isotype, paratypes, etc). By default, the names of family specialists are obtained from a global list of about 8,500 plant taxonomists names constructed by Lima et al. (2020) and provided with plantR. This list was built based on information from the Harvard University Herbaria, the Brazilian Herbaria Network and the American Society of Plant Taxonomists. The dictionary was manually complemented for missing names of taxonomists and it includes common variants of taxonomists names (e.g., missing initials, typos, married or maiden names).
If a column containing the Darwin Core field 'basisOfRecord' or equivalent is
provided ('rec.type' in argument col.names
), then by default, all
occurrences that are not preserved specimens (i.e. human/machine
observations, photos, living specimens, etc.) are classified as having a low
confidence level.
Some specimens are collected by a specialist of the family, but the
identifier information is missing. By default, we assume the same confidence
level for these specimens as that assigned for specimens where the identifier
is the family specialist. But users can choose otherwise by setting the
argument special.collector
to FALSE.
The arguments generalist
and generalist.class
define if taxonomists that
provide identifications for many different families outside his specialty,
often referred to as a generalist, should be considered in the validation and
under which confidence level. There are some names of generalists in the
plantR default taxonomist database; however, this list of generalist
names is currently biased towards plant collectors in South America,
particularly in Brazil.
The argument other.records
controls what to do with types of records which
are not preserved specimens (Darwin Core field
basisOfRecord). If the argument
is NULL (default), all record types are treated the same. Users can set the
argument to one of the confidence levels (i.e. 'unknown', 'low', 'medium' or
'high') to assign the same class for all non preserved specimens or to a
value (i.e., 1 or 2), which correspond to the number of downgrading steps
among levels. For instance, if other.records
is one, the 'high' level
becomes 'medium' and the 'medium' level becomes 'low' ('unknown' and 'low'
levels remain the same).
If you miss the validation from one or more taxonomists, you can include them
using the argument miss.taxonomist
. The format should be:
the name of the family of specialty followed by an underscore and then
the taxonomist name in the TDWG format (e.g. "Bignoniaceae_Gentry, A.H.").
A database of taxonomists different than the plantR
default can be used.
This database must be provided using the argument taxonomist.list
and it
must contain the columns 'family' and 'tdwg.name'. The first column is the
family of specialty of the taxonomist and the second one is her/his name in
the TDWG format. See plantR
function prepName()
on how to get names in
the TDWG format from a list of people's names.
Finally, the user can provide a list of records that should be flagged as
having a high confidence level on their identification. This list should be
provided using the argument voucher.list
and the information that should be
provided is the record unique identifier (i.e. combination of collection code
and number). It is important that the way in which the list of unique
identifiers was generated matches the one used to construct the the
identifiers in the input data frame x
(see help of function
getTombo()
). If a list of records is provided, the user must also
provide a valid column name in x
containing the unique record
identifiers in col.names
.
The input data frame x
, plus a new column 'tax.check'
containing the classes of confidence in species identifications.
Lima, R.A.F. et al. 2020. Defining endemism levels for biodiversity conservation: Tree species in the Atlantic Forest hotspot. Biological Conservation, 252: 108825.
prepName and getTombo.
(df <- data.frame( family.new = c("Bignoniaceae", "Bignoniaceae","Bignoniaceae", "Bignoniaceae","Bignoniaceae","Bignoniaceae"), identifiedBy.new = c("Gentry, A.H.", "Hatschbach, G.", NA, NA, NA, "Hatschbach, G."), recordedBy.new = c(NA, NA, NA, "Gentry, A.H.", NA, NA), typeStatus = c(NA, NA, "isotype", NA, NA, NA), numTombo = c("a_1","b_3","c_7","d_5","e_3","f_4"), stringsAsFactors = FALSE)) validateTax(df) validateTax(df, generalist = TRUE) validateTax(df, voucher.list = "f_4")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.