R/calc_metrics.R
In ecospace: Simulating Community Assembly and Ecological Diversification Using Ecospace Frameworks

Documented in calc_metrics

#'Calculate Ecological Disparity (Functional Diversity) Dynamics as Function of
#'Sample Size.
#'
#'Wrapper to \code{FD::\link[FD]{dbFD}} that calculates common ecological
#'disparity and functional diversity statistics. When used with species-wise
#'simulations of community assembly or ecological diversification (and default
#'\code{increm = 'TRUE'}), calculates statistical dynamics incrementally as a
#'function of species richness. Avoids file-sharing errors so that can be used
#'in 'embarrasingly parallel' implementations in a high-performance computing
#'environment.
#'
#'@param nreps Sample number to calculate statistics for. Default is the first
#'  sample \code{nreps = 1}, but statistics can be calculated for other samples
#'  (i.e., second sample if \code{nreps = 2}), or multiple samples if assigned a
#'  vector (sequence) of integers and function is applied within \code{lapply}
#'  or related function.
#'@param samples Data frame (if \code{nreps = 1}) or list of data frames (if
#'  \code{nreps = seq()} or \code{nreps! = 1}), with each data frame a
#'  species-by-trait matrix with species as rows and traits as columns. Traits
#'  can be binary, numeric, ordered numeric, factor, or ordered factor types.
#'  Each sample is converted to a distance metric (see \code{method} below)
#'  before calculating statistics.
#'@param Smax Maximum number of \code{samples} rows (species) to include in
#'  calculations, incremented starting with first row. Default (\code{NA}) is to
#'  increment to the maximum number of \code{samples} rows (calculated
#'  separately for each data frame sample, if a list of data frames). If
#'  \code{Smax} is greater than the size of a sample, then calculation stops
#'  after calculating the sample statistics and issues a warning.
#'@param Model Optional character string or numeric value naming simulation
#'  model. A warning issues if left blank.
#'@param Param Optional numeric value or character string naming strength
#'  parameter used in simulation. A warning issues if left blank.
#'@param m The number of PCoA axes to keep as 'traits' for calculating FRic and
#'  FDiv in \code{FD::\link[FD]{dbFD}}. Default \code{m = 3} is justified below,
#'  but any integer value greater than 1 is possible. See 'details' for more
#'  information.
#'@param corr Character string specifying the correction method to use in
#'  \code{FD::\link[FD]{dbFD}} when the species-by-species distance matrix
#'  cannot be represented in a Euclidean space. Default \code{corr = 'lingoes'} is
#'  justified below, but see \code{FD::\link[FD]{dbFD}} for other possible
#'  values.
#'@param method Distance measure to use when calculating functional distances
#'  between species. Default is \code{method = 'Euclidean'} using
#'  \code{stats::\link[stats]{dist}}. \code{method = 'Gower'} or any other value
#'  uses Gower distance (using \code{FD::\link[FD]{gowdis}}). Presence of factor
#'  or ordered factor character types forces use of Gower distance, triggering a
#'  warning to notify user when changed internally.
#'@param increm Default \code{increm = 'TRUE'} calculates statistics incrementally
#'  as a function of species richness. \code{increm = 'FALSE'} only calculates a
#'  single set of statistics for the entire sample.
#'@param ... Additional parameters for controlling \code{FD::\link[FD]{dbFD}}.
#'  Common uses include setting \code{calc.FRic = FALSE} or \code{calc.FDiv = FALSE}
#'  to exclude calculation of FRic and FDiv. Note that the arguments \code{m},
#'  \code{corr}, and \code{method} above have different defaults than used in
#'  \code{FD::\link[FD]{dbFD}}, and \code{w.abun = FALSE} and
#'  \code{messages = FALSE} are also internally changed to different defaults.
#'  These and others can be controlled here.
#'
#'@details The primary goal of this function is to describe the statistical
#'  dynamics of common ecological disparity (functional diversity) metrics as a
#'  function of species richness (sample size). Statistics are calculated
#'  incrementally within samples, first for the first row (species), second for
#'  the first and second rows, ..., ending with the entire sample (by default,
#'  or terminating with \code{Smax} total species). The function assumes that
#'  supplied samples are ecologically or evolutionary cohesive assemblages in
#'  which there is a logical order to the rows (such that the sixth row is the
#'  sixth species added to the assemblage) and that such incremental
#'  calculations are sensible. See Novack-Gottshall (2016a,b) for additional
#'  context. Samples must have species as rows and traits as columns (of many
#'  allowed character types), and have \code{class(data.frame)} or a list of
#'  such data frames, with each data frame a separate sample.
#'
#'  Statistics calculated include four widely used in ecological disparity
#'  studies (adapted from studies of morphological disparity) and four used in
#'  functional diversity studies. See Foote (1993), Ciampaglio et al. (2001),
#'  and Wills (2001) for definitions and details on morphological disparity
#'  measures and Novack-Gottshall (2007; 2016a,b) for implementation as measures
#'  of ecological disparity. See Mason et al. (2005), Anderson et al. (2006),
#'  Villeger et al. (2008), Laliberte and Legendre (2010), Mouchet et al.
#'  (2010), Mouillot et al. (2013) for definitions and details on functional
#'  diversity statistics. For computation details of functional diversity
#'  metrics, see Laliberte and Shipley (2014) package FD, and especially
#'  \code{FD::\link[FD]{dbFD}}, which this function wraps around to calculate
#'  the functional diversity statistics.
#'
#'  Statistic (\code{S}) is species (taxonomic) richness, or sample size.
#'
#'  When \code{increm = 'FALSE'}, the function calculates statistics for the
#'  entire sample(s) instead of doing so incrementally. In this case, the
#'  implementation is essentially the same as \code{FD::\link[FD]{dbFD}} with
#'  default arguments (e.g., \code{m, corr}) that reduce common calculation
#'  errors, plus inclusion of common morphological disparity statistics.
#'
#'  \strong{Statistics that measure diversity (unique number of life habits /
#'  trait combinations within ecospace / functional-trait space):} \describe{
#'  \item{H}{Life habit richness, the number of functionally unique trait
#'  combinations.}
#'
#'  \strong{Statistics that measure disparity (or dispersion of species within
#'  ecospace / functional-trait space) (Note these statistics are sensitive to
#'  outliers and sample size):} \item{M}{Maximum pairwise distance between
#'  species in functional-trait space, measured using the distance \code{method}
#'  specified above.} \item{FRic}{Functional richness, the minimal convex-hull
#'  volume in multidimensional principal coordinates analysis (PCoA) trait-space
#'  ordination.}\item{FDiv}{Functional divergence, the mean distance of species
#'  from the PCoA trait-space centroid.}  }
#'
#'  \strong{Statistics that measure internal structure (i.e., clumping or
#'  inhomogeneities within the trait-space):} \describe{ \item{D}{Mean pairwise
#'  distance between species in functional-trait space, measured using the
#'  distance \code{method} specified above.} \item{V}{Total variance, the sum of
#'  variances for each functional trait across species; when using factor or
#'  ordered factor character types, this statistic cannot be measured and is
#'  left blank, with a warning.} \item{FDis}{Functional dispersion, the total
#'  deviance of species from the circle with radius equal to mean distance from
#'  PCoA trait-space centroid.} }
#'
#'  \strong{Statistics that measure spacing among species within the
#'  trait-space:} \describe{ \item{FEve}{Functional evenness, the evenness of
#'  minimum-spanning-tree lengths between species in PCoA trait-space.} }
#'
#'  The default number of PCoA axes used in calculating of FRic and FDiv equals
#'  \code{m = 3}. Because their calculation requires more species than traits
#'  (here the \code{m = 3} PCoA axes), the four functional diversity statistics
#'  are only calculated when a calculated sample contains a minimum of \code{m}
#'  species (S) or unique life habits (H). \code{qual.FRic} is appended to the
#'  output to record the proportion ('quality') of PCoA space retained by this
#'  loss of dimensionality. Although including more PCoA axes allows greater
#'  statistical power (Villeger et al. 2011, Maire et al. 2015), the use of
#'  \code{m = 3} here is computationally manageable, ecologically meaningful, and
#'  allows standardized measurement of statistical dynamics across the wide
#'  range of sample sizes typically involved in simulations of
#'  ecological/evolutionary assemblages, especially when functionally redundant
#'  data occur. Other integers greater than 1 can also be specified. See the
#'  help file for \code{FD::\link[FD]{dbFD}} for additional information.
#'
#'  Lingoes correction \code{corr = 'lingoes'}, as recommended by Legendre and
#'  Anderson (1999), is called when the species-by-species distance matrix
#'  cannot be represented in a Euclidean space. See the help file for
#'  \code{FD::\link[FD]{dbFD}} for additional information.
#'
#'  Note that the ecological disparity statistics are calculated on the raw
#'  (unstandardized) distance matrix. The functional diversity statistics are
#'  calculated on standardized data using standardizations in
#'  \code{FD::\link[FD]{dbFD}}. If all traits are numeric, they by default are
#'  standardized to mean 0 and unit variance. If not all traits are numeric,
#'  Gower's (1971) standardization by the range is automatically used.
#'
#'@return Returns a data frame (if \code{nreps} is a single integer or
#'  \code{samples} is a single data frame) or a list of data frames. Each
#'  returned data frame has \code{Smax} rows corresponding to incremental
#'  species richness (sample size) and 12 columns, corresponding to:
#'
#'  \item{Model}{(optional) \code{Model} name} \item{Param}{(optional) strength
#'  parameter} \item{S}{Species richness (sample size)} \item{H}{Number of
#'  functionally unique life habits} \item{D}{Mean pairwise distance}
#'  \item{M}{Maximum pairwise distance} \item{V}{Total variance}
#'  \item{FRic}{Functional richness} \item{FEve}{Functional evenness}
#'  \item{FDiv}{Functional divergence} \item{FDis}{Functional dispersion}
#'  \item{qual.FRic}{proportion ('quality') of total PCoA trait-space used when
#'  calculating FRic and FDiv}
#'
#'
#'@note A bug exists within \code{FD::\link[FD]{gowdis}} where nearest-neighbor
#'  distances can not be calculated when certain characters (especially numeric
#'  characters with values other than 0 and 1) share identical traits across
#'  species. The nature of the bug is under investigation, but the current
#'  implementation is reliable under most uses. If you run into problems because
#'  of this bug, a work-around is to manually change the function to call
#'  \code{cluster::\link[cluster]{daisy}} using \code{metric = "gower"} instead.
#'
#'  If calculating statistics for more than several hundred samples, it is
#'  recommended to use a parallel-computing environment. The function has been
#'  written to allow usage (using \code{\link{lapply}} or some other list-apply
#'  function) in 'embarrassingly parallel' implementations in such HPC
#'  environments. Most importantly, overwriting errors during calculation of
#'  convex hull volume in FRic are avoided by creating CPU-specific temporarily
#'  stored vertices files.
#'
#'  See Novack-Gottshall (2016b) for recommendations for using random forest
#'  classification trees to conduct multi-model inference.
#'
#'@author Phil Novack-Gottshall \email{pnovack-gottshall@@ben.edu}
#'
#'@references Anderson, M. J., K. E. Ellingsen, and B. H. McArdle. 2006.
#'  Multivariate dispersion as a measure of beta diversity. \emph{Ecology
#'  Letters} 9(6):683-693.
#'@references Ciampaglio, C. N., M. Kemp, and D. W. McShea. 2001. Detecting
#'  changes in morphospace occupation patterns in the fossil record:
#'  characterization and analysis of measures of disparity. \emph{Paleobiology}
#'  27(4):695-715.
#'@references Foote, M. 1993. Discordance and concordance between morphological
#'  and taxonomic diversity. \emph{Paleobiology} 19:185-204.
#'@references Gower, J. C. 1971. A general coefficient of similarity and some of
#'  its properties. \emph{Biometrics} 27:857-871.
#'@references Laliberte, E., and P. Legendre. 2010. A distance-based framework
#'  for measuring functional diversity from multiple traits. \emph{Ecology}
#'  91(1):299-305.
#'@references Legendre, P., and M. J. Anderson. 1999. Distance-based redundancy
#'  analysis: testing multispecies responses in multifactorial ecological
#'  experiments. \emph{Ecological Monographs} 69(1):1-24.
#'@references Maire, E., G. Grenouillet, S. Brosse, and S. Villeger. 2015. How
#'  many dimensions are needed to accurately assess functional diversity? A
#'  pragmatic approach for assessing the quality of functional spaces.
#'  \emph{Global Ecology and Biogeography} 24(6):728-740.
#'@references Mason, N. W. H., D. Mouillot, W. G. Lee, and J. B. Wilson. 2005.
#'  Functional richness, functional evenness and functional divergence: the
#'  primary components of functional diversity. \emph{Oikos} 111(1):112-118.
#'@references Mouchet, M. A., S. Villeger, N. W. H. Mason, and D. Mouillot.
#'  2010. Functional diversity measures: an overview of their redundancy and
#'  their ability to discriminate community assembly rules. \emph{Functional
#'  Ecology} 24(4):867-876.
#'@references Mouillot, D., N. A. J. Graham, S. Villeger, N. W. H. Mason, and D.
#'  R. Bellwood. 2013. A functional approach reveals community responses to
#'  disturbances. \emph{Trends in Ecology and Evolution} 28(3):167-177.
#'@references Novack-Gottshall, P.M. 2007. Using a theoretical ecospace to
#'  quantify the ecological diversity of Paleozoic and modern marine biotas.
#'  \emph{Paleobiology} 33: 274-295.
#'@references Novack-Gottshall, P.M. 2016a. General models of ecological
#'  diversification. I. Conceptual synthesis. \emph{Paleobiology} 42: 185-208.
#'@references Novack-Gottshall, P.M. 2016b. General models of ecological
#'  diversification. II. Simulations and empirical applications.
#'  \emph{Paleobiology} 42: 209-239.
#'@references Villeger, S., N. W. H. Mason, and D. Mouillot. 2008. New
#'  multidimensional functional diversity indices for a multifaceted framework
#'  in functional ecology. \emph{Ecology} 89(8):2290-2301.
#'@references Villeger, S., P. M. Novack-Gottshall, and D. Mouillot. 2011. The
#'  multidimensionality of the niche reveals functional diversity changes in
#'  benthic marine biotas across geological time. \emph{Ecology Letters}
#'  14(6):561-568.
#'@references Wills, M. A. 2001. Morphological disparity: a primer. Pp. 55-143.
#'  \emph{In} J. M. Adrain, G. D. Edgecombe, and B. S. Lieberman, eds.
#'  \emph{Fossils, phylogeny, and form: an analytical approach.} Kluwer
#'  Academic/Plenum Publishers, New York.
#'@references Laliberte, E., and B. Shipley. 2014. \emph{FD: Measuring
#'  functional diversity from multiple traits, and other tools for functional
#'  ecology}, Version 1.0-12.
#'
#'@seealso \code{FD::\link[FD]{dbFD}} for details on the core function wrapped
#'  here for calculating functional diversity statistics. \code{\link{neutral}},
#'  \code{\link{redundancy}}, \code{\link{partitioning}},
#'  \code{\link{expansion}} for building samples using simulations.
#'  \code{\link{rbind_listdf}} for efficient way to combine lists of data frames
#'  for subsequent analyses.
#'
#' @examples
#' # Build ecospace framework and a random 50-species sample using neutral rule:
#' ecospace <- create_ecospace(nchar = 15, char.state = rep(3, 15), char.type = rep("numeric", 15))
#' sample <- neutral(Sseed = 5, Smax = 50, ecospace = ecospace)
#' # Using Smax = 10 here for fast example
#' metrics <- calc_metrics(samples = sample, Smax = 10, Model = "Neutral", Param = "NA")
#' metrics
#'
#' # Plot statistical dynamics as function of species richness
#' op <- par()
#' par(mfrow = c(2,4), mar = c(4, 4, 1, .3))
#' attach(metrics)
#' plot(S, H, type = "l", cex = .5)
#' plot(S, D, type = "l", cex = .5)
#' plot(S, M, type = "l", cex = .5)
#' plot(S, V, type = "l", cex = .5)
#' plot(S, FRic, type = "l", cex = .5)
#' plot(S, FEve, type = "l", cex = .5)
#' plot(S, FDiv, type = "l", cex = .5)
#' plot(S, FDis, type = "l", cex = .5)
#'
#' par(op)
#'
#' # Argument 'increm' switches between incremental and entire-sample calculation
#' metrics2 <- calc_metrics(samples = sample, Smax = 10, Model = "Neutral",
#'                          Param = "NA", increm = FALSE)
#' metrics2
#' identical(tail(metrics, 1), metrics2) # These are identical
#'
#' # ... can further control 'FD::dbFD', here turning off calculation of FRic and FDiv
#' metrics3 <- calc_metrics(samples = sample, Smax = 10, Model = "Neutral",
#'                          Param = "NA", calc.FRic = FALSE, calc.FDiv = FALSE)
#' metrics3
#' rbind(metrics[10, ], metrics3[10, ])
#'
#' \dontrun{
#' # Can take a few minutes to run to completion
#' # Calculate for 5 samples
#' nreps <- 1:5
#' samples <- lapply(X = nreps, FUN = neutral, Sseed = 5, Smax = 50, ecospace)
#' metrics <- lapply(X = nreps, FUN = calc_metrics, samples = samples,
#'                   Model = "Neutral", Param = "NA")
#' alarm()
#' str(metrics)
#' }
#'@export
calc_metrics <-
  function(nreps = 1, samples = NA, Smax = NA, Model = "", Param = "", m = 3,
           corr = "lingoes", method = "Euclidean", increm = TRUE, ...) {
    if (is.logical(samples))
      stop("you must provide a list of samples to calculate\n")
    if (is.data.frame(samples)) {
      sample <- samples
    } else {
      sample <- samples[[nreps]]
    }
    if (!is.data.frame(sample))
      stop("samples is not a data frame or list of data frames.")
    if (Model == "")
      warning("you did not specify a model name. Model will be left empty.")
    if (Param == "")
      warning("you did not specify a parameter value. Param will be left empty.")
    if (!is.numeric(Smax)) {
      ns <- nrow(sample)
    } else {
      ns <- Smax
      if (ns < Smax) {
        warning("Smax specified is greater than sample size. Calculation
              stopped prematurely when reached complete sample size.")
      }
    }
    if (increm) {
      Smin <- 1
    } else {
      Smin <- ns
    }
    if (method != "Euclidean" |
        any(sapply(sample, data.class) == "factor") |
        any(sapply(sample, data.class) == "ordered")) {
      method <- "Gower"
      warning("'sample' converted to distance matrix using Gower distance.")
    }
    odir <-
      setwd(tempdir())     # Specify the pre-built (and CPU-process unique) temp directory for storage of vert.txt temp files for convex hull calculations
    on.exit(setwd(odir))
    # Modify arguments used in 'FD::dbFD':
    dots <- list(...)
    dot.names <- names(dots)
    args <- formals(dbFD)
    arg.names <- names(args)
    mod.arg <- match(dot.names, arg.names)
    if (any(is.na(mod.arg)))
      stop("'...' specifies arguments that are not arguments to 'FD::dbFD'")
    args$m <- m
    args$w.abun <- FALSE
    args$messages <- FALSE
    args$corr <- corr
    for (a in 1:length(mod.arg)) {
      args[mod.arg[a]] <- dots[a]
    }
    sam.out <-
      data.frame(Model = Model, Param = Param, S = numeric(ns),
        H = numeric(ns), D = numeric(ns), M = numeric(ns), V = numeric(ns),
        FRic = numeric(ns), FEve = numeric(ns), FDiv = numeric(ns),
        FDis = numeric(ns), qual.FRic = numeric(ns))
    for (s in Smin:ns) {
      sam <- sample[1:s, ]
      sam.out$S[s] <- s
      H <- nrow(unique(sam))
      sam.out$H[s] <- H
      if (method == "Gower") {
        dist <- FD::gowdis(sam)
      } else {
        dist <- dist(sam)
      }
      if (any(is.nan(dist)) | length(dist) == 0)
        next
      sam.out$D[s] <- mean(dist)
      sam.out$M[s] <- max(dist)
      sam.out$V[s] <- sqrt(sum(apply(sam, 2, var, na.rm = TRUE)))
      if (s <= m | H <= m)
        next
      args$x <- dist
      FD <- do.call("dbFD", args = args)
      sam.out$FDis[s] <- FD$FDis
      sam.out$FEve[s] <- FD$FEve
      sam.out$FRic[s] <- NA
      sam.out$qual.FRic[s] <- NA
      sam.out$FDiv[s] <- NA
      if (!is.null(FD$FRic)) {
        sam.out$FRic[s] <- FD$FRic
        sam.out$qual.FRic[s] <- FD$qual.FRic
      }
      if (!is.null(FD$FDiv)) {
        sam.out$FDiv[s] <- FD$FDiv
      }
    }
    if (!increm)
      sam.out <- sam.out[s, ]
    return(sam.out)
  }
Any scripts or data that you put into this service are public.
ecospace documentation built on July 8, 2020, 5:54 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
ecospace
Simulating Community Assembly and Ecological Diversification Using Ecospace Frameworks

R/calc_metrics.R
In ecospace: Simulating Community Assembly and Ecological Diversification Using Ecospace Frameworks

Defines functions calc_metrics

Documented in calc_metrics

Try the ecospace package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

ecospace Simulating Community Assembly and Ecological Diversification Using Ecospace Frameworks

R/calc_metrics.R In ecospace: Simulating Community Assembly and Ecological Diversification Using Ecospace Frameworks

Defines functions calc_metrics

Documented in calc_metrics

Try the ecospace package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

ecospace
Simulating Community Assembly and Ecological Diversification Using Ecospace Frameworks

R/calc_metrics.R
In ecospace: Simulating Community Assembly and Ecological Diversification Using Ecospace Frameworks