enrichment | R Documentation |
Calculates trait enrichment factors (i.e. trait records/all records) for each unique element of a specified variable (e.g. taxon, location, year, etc).
enrichment(
all_rec,
trait_rec,
ext_var = c("new_kingdom", "new_phylum", "new_class", "new_order", "new_family",
"new_genus", "new_species"),
by = "new_full_name",
coll_bias = FALSE,
cores = 1,
status_feed = FALSE
)
all_rec |
Data.frame of fungal occurrence records in Darwin Core format |
trait_rec |
Data.frame of records within all_rec that are associated with the trait of interest. See: |
ext_var |
Character vector specifying additional variables to keep with the "by" variable in output data frame. Useful for retaining taxonomic hierarchy information for species. Note that "by" variable elements must only have one unique set of values for ext_vars. |
by |
Character string specifying the variable in the all_rec/trait_rec data sets, for which enrichment factors will be calculated. Default is "new_full_name" (i.e. full taxon names). |
coll_bias |
Logical. If TRUE, collector bias for total records and trait records is calculated for each variable element. Based on collector information in the "recordedBy" field of Darwin Core data sets. |
cores |
Integer specifying number of cores to use for processing. Default is 1. Values greater than 1 utilize parallel processing (not allowed on Windows systems). Parallel processing not recommended for use in GUI setting. See |
status_feed |
Logical. If TRUE, status of collector bias analysis in printed to the console. |
Collector bias calculations help determine which variable elements may
have biased or untrustworthy enrichment values. This is done by determining what
proportions of records were collected by one specific collector or associated
group of collectors (e.g., a research team making collections together). If
one variable element (e.g., species) was collected excessively by one collector
or collector group there is higher chance that the enrichment value may be skewed.
For example, if our enrichment value was based on association with fire-affected environments
and 90% of the records for one species were collected by one collector in a burned environment,
the fire-associated enrichment for that species will be high but highly biased.
If other collectors had also found this species in fire-affected environments we
may have greater confidence that this species does have a high fire-associated enrichment value.
While collector bias calculations can be useful, there are some caveats. Most notably,
collector information is not always reported, and this may be heavily location dependent
(e.g., some countries like the UK or Japan don't seem to report collector names).
In these scenarios, it may be impossible to determine accurate bias values because
collector information is missing. If you suspect that your data does not have
consistent collector information (e.g., "recordedBy" field in Darwin Core data sets)
you should use the collector bias analysis with caution.
Data.frame containing unique variable elements in the input data set (e.g. unique taxa) with the following output fields appended for each variable element.
freq |
Numeric. Number of records in the full dataset |
trait_freq |
Numeric. Number of records in the trait dataset |
trait_ratio |
Numeric. trait_freq/freq |
coll_blanks |
Numeric. Number of total records with blank collector info. |
blanks_bias |
Numeric. Proportion of total records that have blank collector info. |
coll_blanks_t |
Numeric. Number of trait records with blank collector info. |
blanks_bias_t |
Numeric. Proportion of trait records that have blank collector info. |
max_bias |
Numeric. Max proportion of total records associated with one collector group. Blanks are treated as a collector group. |
coll_groups |
Numeric. Number of unique collector groups for total records. Blanks are treated as a collector group. |
max_bias_t |
Numeric. Max proportion of trait records associated with one collector group. Blanks are treated as a collector group. |
coll_groups_t |
Numeric. Number of unique collector groups for trait records. Blanks are treated as a collector group. |
Blanks are automatically removed, and are not treated as a unique variable element.
library(fungarium)
data(agaricales_updated) #import sample data set with updated taxon names
#apply filters
agaricales_updated <- agaricales_updated[agaricales_updated$error=="",]
agaricales_updated <- agaricales_updated[agaricales_updated$occurrenceRemarks!=""|agaricales_updated$habitat!=""|agaricales_updated$associatedTaxa!="",]
#Finds fire-associated records
string1 <- "(?i)charred|burn(t|ed)|scorched|fire.?(killed|damaged|scarred)|killed.by.fire"
#Removes records falsely identified as fire-associated
string2 <- "(?i)un.?burn(t|ed)"
#find trait-relevant records
trait_rec <- find_trait(agaricales_updated,pos_string=string1, neg_string=string2)
#get trait enrichment
trait_enrichment <- enrichment(all_rec=agaricales_updated, trait_rec=trait_rec, status_feed=FALSE, coll_bias=TRUE)
#filter taxa based on collector bias (optional)
trait_enrichment <- trait_enrichment[trait_enrichment$max_bias<=0.75,]
#filter taxa based on total number of records (optional)
trait_enrichment <- trait_enrichment[trait_enrichment$freq>=5,]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.