R/chemodiv.R
In chemodiv: Analysing Chemodiversity of Phytochemical Data

#' chemodiv: A package for analysing phytochemical diversity
#'
#' *chemodiv* is an R package for analysing the chemodiversity of
#' phytochemical data. The package includes a number of functions that enables
#' quantification and visualization of phytochemical diversity and
#' dissimilarity for any type of phytochemical (and similar) samples, such as
#' herbivore defence compounds, volatiles and similar. Importantly,
#' calculations of diversity and dissimilarity can incorporate biosynthetic
#' and/or structural properties of the phytochemical compounds, resulting
#' in more comprehensive quantifications of diversity and dissimilarity.
#' Functions in the R-package will work best for sets of data, commonly
#' generated by chemical ecologists using GC-MS, LC-MS or similar, where all
#' or most compounds in the samples have been confidently identified.
#' See Petren et al. 2023a for a detailed description of the package,
#' and Petren et al. 2023b for a more in-depth discussion and review
#' of plant chemodiversity.
#'
#' Two datasets are needed to use the full set of analyses
#' included in the package.
#'
#' The first dataset should contain data on the relative
#' abundance/concentration (i.e. proportion) of different compounds (columns)
#' in different samples (rows). See the included
#' dataset \code{\link{minimalSampData}} for a basic example.
#' Note that all calculations of diversity, and most calculations of
#' dissimilarity, are only performed on relative, rather than absolute, values.
#'
#' The second dataset should contain, in each of three columns in a data frame,
#' the compound name, SMILES and InChIKey IDs of all the compounds
#' present in the first dataset. See the included dataset
#' \code{\link{minimalCompData}} for a basic example. SMILES and InChIKey
#' are chemical identifiers that are easily obtained for each compound
#' by searching for it in PubChem \url{https://pubchem.ncbi.nlm.nih.gov/}.
#' Here, a search with a common name will bring up the compound's
#' record in the database, where the (isomeric/canonical) SMILES and
#' InChIKey are included. Various automated tools such as
#' the PubChem Identifier Exchange Service
#' \url{https://pubchem.ncbi.nlm.nih.gov/idexchange/idexchange.cgi} or
#' The Chemical Translation Service \url{https://cts.fiehnlab.ucdavis.edu/}
#' can also be used. The user is intentionally required to compile the
#' chemical identifiers manually to ensure these are correct,
#' as lists of compounds very often contain compounds wrongly named,
#' wrongly formatted, under various synonyms etc. which prevents easy
#' automatic translation of compound names to SMILES and InChIKey.
#' Note that SMILES IDs might contain the character combination \code{"\\C"}.
#' If SMILES are entered manually directly in R, this is interpreted as an
#' unrecognized escape and results in an error. In this case, an extra
#' backslash has to be added: \code{"\\\C"}. If the dataset is instead
#' imported into R as a csv-file or txt-file (recommended), this is done
#' automatically and no manual edits has to be done.
#'
#' The second dataset with the chemical IDs is primarily used to construct
#' one or more dissimilarity matrices with pairwise dissimilarities between
#' chemical compounds, which can then be used in calculations of phytochemical
#' diversity and dissimilarity. As noted above, to do this, the compounds
#' in the samples have to be identified and their chemical IDs listed.
#' If some compounds in a dataset are unknown, these can be handled in
#' different ways decided by the user, see \code{\link{compDis}} for details.
#' If many or all compounds are unknown, as is common for more metabolomic
#' type datasets, phytochemical diversity and dissimilarity can still be
#' calculated using indices that do not consider compound dissimilarities.
#' Alternatively, other ways to calculate compound dissimilarities,
#' not based on knowing compound identities, can be used.
#' For example, cosine dissimilarities between tandem (MS/MS) mass spectra of
#' metabolomic features can be calculated in the GNPS
#' framework \url{https://gnps.ucsd.edu} (Wang et al. 2016).
#' A dissimilarity matrix of such dissimilarities can then be used
#' for the \code{compDisMat} argument in various functions in the package,
#' thereby enabling comprehensive quantification of phytochemical diversity
#' and dissimilarity also for datasets consisting of unidentified compounds.
#'
#' Once the dataset with samples and the dataset with compounds are prepared,
#' these should be imported/constructed as separate data frames in R,
#' and all analyses in the package can then be performed, in largely the
#' same order as they appear in the list below.
#'
#' @section Data format checks:
#' \code{\link{chemoDivCheck}}
#'
#' @section Compound classification and dissimilarity:
#' \code{\link{NPCTable}}
#' \code{\link{compDis}}
#'
#' @section Diversity calculations:
#' \code{\link{calcDiv}}
#' \code{\link{calcBetaDiv}}
#' \code{\link{calcDivProf}}
#'
#' @section Sample dissimilarities:
#' \code{\link{sampDis}}
#'
#' @section Molecular network and properties:
#' \code{\link{molNet}}
#'
#' @section Chemodiversity and network plots:
#' \code{\link{molNetPlot}}
#' \code{\link{chemoDivPlot}}
#'
#' @section Shortcut function:
#' \code{\link{quickChemoDiv}}
#'
#' @author Hampus Petren, Tobias G. Koellner, Robert R. Junker
#'
#' @references
#' Petren H, Koellner TG, Junker RR. 2023a. Quantifying chemodiversity
#' considering biochemical and structural properties of compounds with the
#' R package *chemodiv*. New Phytologist 237: 2478-2492.
#'
#' Petren H, Anaia RA, Aragam KS, Braeutigam A, Eckert S, Heinen R,
#' Jakobs R, Ojeda L, Popp M, Sasidharan R, Schnitzler J-P, Steppuhn A,
#' Thon F, Tschikin S, Unsicker SB, van Dam NM, Weisser WW, Wittmann MJ,
#' Yepes S, Ziaja D, Meuller C, Junker RR. 2023b. Understanding the
#' phytochemical diversity of plants: Quantification, variation and
#' ecological function. bioRxiv doi: 10.1101/2023.03.23.533415.
#'
#' Wang M, Carver JJ, Phelan VV, et al. 2016. Sharing and community
#' curation of mass spectrometry data with Global Natural Products
#' Social Molecular Networking. Nature Biotechnology 34: 828-837.
#'
#' @seealso \url{https://github.com/hpetren/chemodiv}
#'
#' @importFrom rlang .data
#'
#' @docType package
#' @name chemodiv
NULL