enrichment: Get trait enrichment factors

View source: R/enrichment.R

enrichmentR Documentation

Get trait enrichment factors

Description

Calculates trait enrichment factors (i.e. trait records/all records) for each unique element of a specified variable (e.g. taxon, location, year, etc).

Usage

enrichment(
  all_rec,
  trait_rec,
  ext_var = c("new_kingdom", "new_phylum", "new_class", "new_order", "new_family",
    "new_genus", "new_species"),
  by = "new_full_name",
  coll_bias = FALSE,
  cores = 1,
  status_feed = FALSE
)

Arguments

all_rec

Data.frame of fungal occurrence records in Darwin Core format

trait_rec

Data.frame of records within all_rec that are associated with the trait of interest. See: find_trait

ext_var

Character vector specifying additional variables to keep with the "by" variable in output data frame. Useful for retaining taxonomic hierarchy information for species. Note that "by" variable elements must only have one unique set of values for ext_vars.

by

Character string specifying the variable in the all_rec/trait_rec data sets, for which enrichment factors will be calculated. Default is "new_full_name" (i.e. full taxon names).

coll_bias

Logical. If TRUE, collector bias for total records and trait records is calculated for each variable element. Based on collector information in the "recordedBy" field of Darwin Core data sets.

cores

Integer specifying number of cores to use for processing. Default is 1. Values greater than 1 utilize parallel processing (not allowed on Windows systems). Parallel processing not recommended for use in GUI setting. See parallel::mclapply.

status_feed

Logical. If TRUE, status of collector bias analysis in printed to the console.

Details

Collector bias calculations help determine which variable elements may have biased or untrustworthy enrichment values. This is done by determining what proportions of records were collected by one specific collector or associated group of collectors (e.g., a research team making collections together). If one variable element (e.g., species) was collected excessively by one collector or collector group there is higher chance that the enrichment value may be skewed. For example, if our enrichment value was based on association with fire-affected environments and 90% of the records for one species were collected by one collector in a burned environment, the fire-associated enrichment for that species will be high but highly biased. If other collectors had also found this species in fire-affected environments we may have greater confidence that this species does have a high fire-associated enrichment value.

While collector bias calculations can be useful, there are some caveats. Most notably, collector information is not always reported, and this may be heavily location dependent (e.g., some countries like the UK or Japan don't seem to report collector names). In these scenarios, it may be impossible to determine accurate bias values because collector information is missing. If you suspect that your data does not have consistent collector information (e.g., "recordedBy" field in Darwin Core data sets) you should use the collector bias analysis with caution.

Value

Data.frame containing unique variable elements in the input data set (e.g. unique taxa) with the following output fields appended for each variable element.

freq

Numeric. Number of records in the full dataset

trait_freq

Numeric. Number of records in the trait dataset

trait_ratio

Numeric. trait_freq/freq

coll_blanks

Numeric. Number of total records with blank collector info.

blanks_bias

Numeric. Proportion of total records that have blank collector info.

coll_blanks_t

Numeric. Number of trait records with blank collector info.

blanks_bias_t

Numeric. Proportion of trait records that have blank collector info.

max_bias

Numeric. Max proportion of total records associated with one collector group. Blanks are treated as a collector group.

coll_groups

Numeric. Number of unique collector groups for total records. Blanks are treated as a collector group.

max_bias_t

Numeric. Max proportion of trait records associated with one collector group. Blanks are treated as a collector group.

coll_groups_t

Numeric. Number of unique collector groups for trait records. Blanks are treated as a collector group.

Note

Blanks are automatically removed, and are not treated as a unique variable element.

Examples

library(fungarium)
data(agaricales_updated) #import sample data set with updated taxon names

#apply filters
agaricales_updated <- agaricales_updated[agaricales_updated$error=="",]
agaricales_updated <- agaricales_updated[agaricales_updated$occurrenceRemarks!=""|agaricales_updated$habitat!=""|agaricales_updated$associatedTaxa!="",]

#Finds fire-associated records
string1 <- "(?i)charred|burn(t|ed)|scorched|fire.?(killed|damaged|scarred)|killed.by.fire"

#Removes records falsely identified as fire-associated
string2 <- "(?i)un.?burn(t|ed)"

#find trait-relevant records
trait_rec <- find_trait(agaricales_updated,pos_string=string1, neg_string=string2)

#get trait enrichment
trait_enrichment <- enrichment(all_rec=agaricales_updated, trait_rec=trait_rec, status_feed=FALSE, coll_bias=TRUE)

#filter taxa based on collector bias (optional)
trait_enrichment <- trait_enrichment[trait_enrichment$max_bias<=0.75,]

#filter taxa based on total number of records (optional)
trait_enrichment <- trait_enrichment[trait_enrichment$freq>=5,]

hjsimpso/fungarium documentation built on Aug. 23, 2023, 3:59 p.m.