library(knitr)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
knit_print.data.frame = function(x, ...) {
    res = paste(c("", "", knitr::kable(x, output = FALSE)), collapse = "\n")
    asis_output(res)
}
# register the method
registerS3method("knit_print", "data.frame", knit_print.data.frame)
library("FUNGuildR")

FUNGuildR

R-CMD-check Codecov test coverage

FUNGuildR is a tool for assigning trait information based on matching to a taxonomic classification, using the FUNGuild database. In normal use, the database is queried for each use, because FUNGuild are continually updated as new information is submitted. However, FUNGuildR also includes functions to download the database and store it as an R object to speed up repeated queries, make queries reproducible over time, and allow local queries without internet access.

Installation

For the moment, FUNGuildR can only be installed from GitHub. To install, be sure you have installed the devtools package, and then type:

devtools::install_github("brendanf/FUNGuildR")

Guild assignment

The main function is funguild_assign(). It takes as its only required argument a data.frame, which should contain a character column named "Taxonomy". It returns a version of the same data.frame (as a tibble) with additional columns:

taxon: The name of the matched taxon.
guid: Globally Unique Identifier for the database record.
mbNumber: The MycoBank number of the taxon.
taxonomicLevel: A numeric representation of the taxonomic rank; higher numbers are lower ranks.
trophicMode: A very general overview of how the organism gets its nutrition; one or more of Saprotroph, Pathotroph, and Symbiotroph.
guild: One or more narrower categories for how the organism gets its nutrition.
confidenceRanking: The confidence level of the guild assignment; Possible, Probable, or Highly Probable.
growthForm: The general growth morphology of the organism (or its fruiting body). Multiple values may be given.
trait: Additional traits about the organism, such as wood decay type or toxicity.
note: Additional notes about the entry.
citationSource: Citation(s) for the information about the taxon.

That's it!

Here's how it works on a sample database (scroll to see additional columns):

sample_fungi

sample_guilds <- funguild_assign(sample_fungi)
sample_guilds

For more information about the meaning of the new columns, see the FUNGuild manual.

Input data format

Each value in the Taxonomy column of the input data.frame should consist of a comma-, colon-, underscore-, or semicolon-delimited list of taxa which the organism on that row belongs to. You can see examples in the sample_fungi data presented above. Taxonomy strings which include taxonomic rank indicators in the styles used by Sintax ("k:", "p:"...) or Unite ("k__, "p__", ...) are also accepted.

Such taxonomic classifications are frequently arranged from the most inclusive taxon (e.g., Kingdom) to the least inclusive taxon (e.g., Species), but this is not actually required for FUNGuild. Not all taxonomic ranks are required; for each row, the algorithm returns results only for the least inclusive taxon which is present in the database.

Database caching

By default, funguild_assign() downloads the FUNGuild database each time they are invoked. In many analysis workflows, where guilds need to be assigned only once, this is not a problem; because the databases are continuously updated, it is good to use the most current version. However, if you are going to call the functions many times, or if you plan to assign guilds in a situation where you have no internet access, you can cache the database(s) locally using the functions get_funguild_db(). This returns the database as a tibble, which can be passed as a second argument to funguild_assign().

fung <- get_funguild_db()

# This isn't necessary for a single query, but it works.
fung_guilds <- funguild_assign(sample_fungi, db = fung)

# It might be more useful in this situation
data_guilds <- lapply(many_datasets, funguild_assign, db = fung)

# Or you can save it for later offline use
saveRDS(fung, "funguild.rds")

#And then load it again
fung <- loadRDS("funguild.rds")

This strategy can also be used for reproduceable research, to store the same version of the database which was used in the original analysis.

Database queries

From the current development version, FUNGuildR also allows queries to the FUNGuild web API. The fields taxon, guid, mbNumber, trophicMode, guild, growthForm, and trait are searchable. For instance, to find all fungi annotated as causing brown rot (a kind of wood decay):

brownrotters <- funguild_query("brown rot", "trait")
nrow(brownrotters)

Here are the first few:

head(brownrotters)

The characters "%" and "*" can be used as wildcards. For instance, we can search for all fungi where the wood decay type is listed:

allrotters <- funguild_query("* rot", "trait")
nrow(allrotters)
unique(allrotters$trait)

Queries can also be run against a locally cached database.

fungi <- get_funguild_database()
allrotters <- funguild_query("* rot", "trait", db = fungi)

NEMAGuild

As of April 2021, the NEMAGuild database is temporarily offline. However, FUNGuildR has functions nemaguild_assign() and get_nemaguild_db() to access it in exactly the same ways as the FUNGuild database. These functions will not work for the time being (unless you already have a cached local copy of NEMAGuild!) but they should work again when NEMAGuild is back online.



brendanf/FUNGuildR documentation built on Oct. 18, 2023, 4:26 p.m.