knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  dev = "png", 
  dev.args = list(type = "cairo-png")
)

Overview

The study of functional traits in ecology enables a greater understanding of the mechanisms underlying patterns of biodiversity. While trait-based research has traditionally focused on mean trait values for a species, there is a growing awareness of the need to pay greater attention to intraspecific trait variation (ITV; Violle et al. 2012). From an evolutionary standpoint, ITV is important because it reflects differences within a species on which natural selection acts, and ecologically these differences within and among species traits may give rise to differences in species interactions (Bolnick et al. 2011, Read et al. 2018).

When ITV is taken into account, trait values of individuals of each species may be represented as a distribution rather than as a single mean value. The degree of trait similarity between species can be measured as the median amount of overlap in trait space between all species pairs in a community. Lower overlap indicates greater trait partitioning between pairs of species in a community. The Ostats package calculates an overlap statistic (O-statistic) to measure the degree of community-level trait overlap by fitting nonparametric kernel density functions to each species’ trait distribution and calculating their areas of overlap (Mouillot et al. 2005, Geange et al. 2011, Read et al. 2018). The median pairwise overlap for a community is calculated by first determining the overlap of each species pair in trait space, and then taking the median overlap of each species pair in a community. Functions in this package can be used to assess the level of species trait overlap and compare across communities. Effect size statistics can also be calculated against local or regional null models.

Note that we have written an accompanying teaching module to Read et al. (2018), which walks through a figure set exercise for the manuscript (Grady et al. 2018). This teaching module may be helpful to review for interpretation of the overlap statistics and graphs.


The Ostats package can be used to do the following:

Step-by-step Examples

Univariate linear example - NEON small mammal communities: Calculating the overlap statistics for linear data against a local null model

The function Ostats() is the primary function in this package. This function first generates density estimates for species trait distributions and calculates the intersection of two density functions to produce pairwise overlap values. Ostats() then calculates a community overlap value from the pairwise overlap values of all species pairs in the community. Finally, Ostats() calculates effect sizes relative to a null model (z‐scores) to test whether the degree of overlap among species is greater than or less than expected by chance, allowing the user to explore the drivers of variation in body size (i.e., mass) overlap.

EXAMPLE: How to find the degree of overlap in body sizes for all small mammal species present in a particular community, and compare it with other communities?

In this example, the input of Ostats() is taken from a data frame with a column for species identification, a column that indicates which community the individual belongs to, and a column with trait measurements, often log-transformed. The code chunk below generates an input dataset for Ostats(), used to calculate the O-statistics for sites Harvard Forest (HARV) and Jornada (JORN) from the National Ecological Observatory Network (NEON) (Read et al., 2018). NEON is a National Science Foundation-funded network of 81 terrestrial and aquatic field sites strategically located across climatic domains in the United States where standardized protocols are used to sample a variety of ecological observations including small mammal body sizes. Harvard Forest is located in the northeastern U.S. (Massachusetts) and encompasses a temperate hardwood forest ecosystem. Jornada is located in the southwestern U.S. (New Mexico) and encompasses a desert ecosystem.

In the following code chunk we subset the data frame, remove incomplete observations, and log-transform the mass column (body mass in grams) so that we can compare body size distributions on a multiplicative scale.

# Load the Ostats package. 
library(Ostats)

dat <- small_mammal_data[small_mammal_data$siteID %in% c('HARV', 'JORN'), 
                         c('siteID', 'taxonID', 'mass')]
dat <- dat[!is.na(dat$mass), ]
dat$log_mass <- log10(dat$mass)

Below is a subset of the input dataset. Each row represents an individual small mammal that was trapped and weighed, with the site where it was captured (siteID), species identity (taxonID), body mass in grams (mass), and the log of the body mass in grams (log_mass). Here we show only a single row from each species at each site.

do.call(rbind, lapply(split(dat, interaction(dat$siteID, dat$taxonID), drop = TRUE), 
                      function(x) x[1,]))

Arguments to Ostats()

In the below example, all arguments are set to their default values.

To explore the drivers of variation in body size overlap, Ostats() can implement a null model to test whether individual species’ body size distributions are more evenly spaced along the trait axis than expected by chance. This approach evaluates the z‐score of each observed community against the distribution of a user defined number of null communities.

Running the function may take several minutes, depending on the size of the dataset and the number of null model permutations. A progress bar is provided to help the user monitor the time until the function completes the job. Note that in the code chunk below, the traits argument is a matrix with one column and as many rows as there are individuals in the dataset. The sp and plots arguments are each factor vectors with length equal to the number of individuals in the dataset. In this example, since the data are body sizes as measured by mass in grams, the data are considered linear for the argument data_type.

 Ostats_example <- Ostats(traits = as.matrix(dat[,'log_mass', drop = FALSE]),
                    sp = factor(dat$taxonID),
                    plots = factor(dat$siteID),
                    random_seed = 517)

The below code chunk shows an example of how the density_args inputs could be changed if so desired:

Ostats_example2 <- Ostats(traits = as.matrix(dat[,'log_mass', drop = FALSE]),
                    sp = factor(dat$taxonID),
                    plots = factor(dat$siteID), 
                    density_args=list(bw = 'nrd0', adjust = 2, n=200),
                    random_seed = 518)

The result of Ostats is a list containing four items. The first item in the list is overlaps_norm, a matrix with one column and the number of rows equal to the number of communities, showing community overlap values for each community with the area under all density functions normalized to 1.

Ostats_example$overlaps_norm

The second item in the list resulting from running the Ostats function is overlaps_unnorm, a matrix with one column and the number of rows equal to the number of communities showing community overlap values for each community with the area under all density functions proportional to the number of observations in that group.

Ostats_example$overlaps_unnorm

Elements in the overlaps_norm and overlaps_unnorm matrices are overlap values for each community, where one indicates complete overlap and zero indicates no overlap between species pairs within the community. A higher overlap value means greater similarity among species trait distributions. The difference between overlaps_norm and overlaps_unnorm is that overlaps_norm does not take species abundance into account when calculating overlap between each species pair. Harvard Forest (HARV) has a high community overlap value indicating greater similarity among species trait distributions, whereas Jornada (JORN) has a very low community overlap value indicating low similarity in species trait distributions. These results are consistent across both overlaps_norm and overlaps_unnorm.

The last two items in the list generated by Ostats() contain the effect size statistics from the null models. The third and fourth items in the list are overlaps_norm_ses and overlaps_unnorm_ses, which each consist of five matrices of effect size statistics against a null model. The difference between these two final items are that in overlaps_norm_ses the area under all density functions is normalized to 1, whereas in overlaps_unnorm_ses the area under all density functions is proportional to the number of observations per community. The code below displays the outputs of these third and fourth items:

# View normalized and non-normalized standardized effect size outputs from null model analysis
Ostats_example$overlaps_norm_ses
Ostats_example$overlaps_unnorm_ses

These effect size values are used to compare the observed overlap statistics with a local null model (z-score test). As mentioned, the upper and lower limits are set at 95% by default. If the ses (standard effect sizes) value is lower than the lower limit for that community, it suggests that the community overlap value observed is lower than expected by chance. Similarly, if the community overlap value is higher than the upper limit, the community has a higher overlap than expected by chance. In this example, regardless of whether normalized or non-normalized values are calculated, the community overlap of Harvard Forest (HARV) falls within the upper and lower quantiles, suggesting that the overlap value at HARV is not significantly different from the overlap values calculated from randomly generated community trait distributions. On the other hand, Jornarda (JORN) has a much lower value than the lower quantile limit, suggesting that Jornada small mammal body size distributions are less similar than expected by chance based on the null models.

Overlap statistics for circular data against a local null model

Ostats() can also be used to calculate overlap statistics of circular data, such as angular data (e.g., direction and orientation) or time. Two different kinds of circular calculations are available: continuous and discrete. To specify circular trait data, set the argument discrete = FALSE and circular = TRUE if the data are circular and continuous, or set discrete = TRUE and circular = TRUE if the data are circular and discrete (i.e., collected every hour).

EXAMPLE: How to find the degree of overlap in time occurences for ant species present in a chamber, and compare across chambers of different conditions?

To illustrate the calculation of overlap statistics applied to discrete, circular data the Ostats package provides a sample dataset called ant_data. This is a subset of the data analysed in Stuble et al. (2014). Stuble and colleagues measured the daily activity patterns of ants in chambers with different air temperatures to explore the effects of increased temperature on seed dispersal by ants. The ant_data dataframe consists of three columns: species (given as genus and species with a space between the two names), chamber (chamber ID 1 or 2), and time (integer values ranging from 0-23 to represent the 24 hours in a day). Chamber 1 was warmed 2.5 degrees Celsius, whereas Chamber 2 was kept at ambient temperature.

head(ant_data)

The arguments are similar to the linear data calculation. In the example below with ant_data, the input data are a record of ants in two chambers and their hour of occurrence. Because the time variable is discrete, we specify discrete = TRUE and circular = TRUE. We specify the argument unique_values to tabulate the density at all possible discrete values that the time of occurrence can take. In this case there are 24 possible values, 0:23. Finally, for continuous circular data (not shown here), circular_args can be used to pass additional arguments to the underlying function circular::circular() used to convert traits to objects of class circular.

# Calculate overlap statistics for hourly data using the ant_data dataset
circular_example <- Ostats(traits = as.matrix(ant_data[, 'time', drop = FALSE]),
                    sp = factor(ant_data$species),
                    plots = factor(ant_data$chamber),
                    discrete = TRUE,
                    circular = TRUE,
                    unique_values = 0:23,
                    random_seed = 519)

The output is the same as in the linear data calculation. The code below shows the normalized and non-normalized overlap values for the two chambers, respectively, in the first four lines followed by the standardized effect sizes from the null models.

circular_example$overlaps_norm
circular_example$overlaps_unnorm
circular_example$overlaps_norm_ses
circular_example$overlaps_unnorm_ses

Regardless of whether the normalized or non-normalized overlap values are considered, for both chamber communities, the ses value is lower than the lower limit for each community, suggesting that the community overlap values observed for each chamber are lower than expected by chance from a null model.

Overlap plots

The graphing function Ostats_plot() depends on ggplot2 and can be used to visualize species trait overlaps of each community for multiple communities.

The input dataset needs to have these information: plots community or site identity: a vector of names to indicate which community or site (i.e., population in a population-level analysis) the individual belongs to. sp taxon identification: a vector of species or taxa names. traits trait measurements: a vector of trait measurements for each individual, or a matrix with rows representing individuals and columns representing traits. overlap_dat This input information is optional. It is an object containing the output of Ostats for the same data. If provided, it is used to label the plot panels with the overlap values scaled up to be across species within a community or populations within a species.

There are various arguments to fine-tune the plot you produce:

Example: Plotting Univariate Linear O-Statistics

The following example plots the NEON small mammal data, showing only Harvard Forest and Jornada.

siteID <- small_mammal_data$siteID
taxonID <- small_mammal_data$taxonID
trait <- log10(small_mammal_data$mass)

sites2use<- c('HARV','JORN') 

Ostats_plot(plots = siteID, 
            sp = taxonID, 
            traits = trait, 
            overlap_dat = small_mammal_Ostats, 
            use_plots = sites2use, 
            name_x = expression(paste('log'[10], ' body mass (g)')), 
            means = TRUE)

The graphs to the left show the intraspecific trait variation as illustrated by density curves, and the graphs to the right show mean trait values. For all graphs probability density is shown on the y-axis and log-transformed body mass on the x-axis. Each color represents a different species and colors are consistent across graphs. The top row illustrates trait values for HARV (Harvard Forest), whereas the bottom row illustrates trait values for JORN (Jornada). The left column of graphs also show the raw O-statistic for the entire community. Note that there is more overlap in body mass for HARV than for JORN.

Example: Plotting Univariate Circular O-Statistics

If trait values are circular, a circular kernel density estimate for each species is plotted on a polar coordinate plot. If trait values are both circular and discrete, a "sunburst" plot is returned.

The following example plots the ant data for chambers 1 and 2. Note that when plotting circular data for the ant example, the arguments 'circular' and 'discrete' should be set to TRUE because ant behavioral traits are measured with periodic units of discrete time (i.e., hours in a day).

siteID <- factor(ant_data$chamber)
taxonID <- factor(ant_data$species)
trait <-  as.matrix(ant_data[, 'time', drop = FALSE])


Ostats_plot(plots = siteID, 
            sp = taxonID, 
            traits = trait, 
            overlap_dat = circular_example, 
            name_x = 'Time (hours)', 
            circular = TRUE,
            discrete = TRUE)

Boostrapping

Example for generating bootstrap of linear univariate O-statistics

This is an illustration of how to bootstrap O-statistics using the example data on small mammal body mass from Harvard Forest (HARV) and Jornada (JORN). The code to load a small example dataset (two sites of NEON small mammal body weights) is copied from the above in the vignette in case users are working through examples separately.

library(Ostats)
library(ggplot2)

### Code copied from vignette to load a small example dataset and calculate O-stats on the observed dataset.
dat <- small_mammal_data[small_mammal_data$siteID %in% c('HARV', 'JORN'),
                         c('siteID', 'taxonID', 'mass')]
dat <- dat[!is.na(dat$mass), ]
dat$log_mass <- log10(dat$mass)

Next, calculate O-stats on the dataset (don't do the null model, just calculate the observed overlap statistic). Then pull the normalized overlap out into a data frame for plotting later.

Ostats_example <- Ostats(traits = as.matrix(dat[,'log_mass', drop = FALSE]),
                         sp = factor(dat$taxonID),
                         plots = factor(dat$siteID),
                         run_null_model = FALSE)

# Convert observed normalized overlap to a data frame for plotting
Ostats_observed <- with(Ostats_example, data.frame(siteID = dimnames(overlaps_norm)[[1]], overlap_norm = round(overlaps_norm[,1], 4)))

Define a custom function to do the bootstrap. Following the suggestion of Maitner et al. (2023), we sample with replacement from each species' traits within each community proportional to that species' relative abundance in that community. We use the nonparametric bootstrap approach. In other words, we are sampling directly from the raw data, instead of fitting a distribution and sampling from that, which would be parametric because we would be using the parameters of a fitted distribution to generate bootstrap samples.

boot_fn <- function(i) {
  # Bootstrap sample of same size as the data, within each community each row is sampled with replacement with equal probability
  # This means the proportion of individuals of each species in the bootstrap sample will be similar to the observed data.

  bsample <- by(dat, dat$siteID, function(sitedat) sitedat[sample(1:nrow(sitedat), replace = TRUE), ])
  bsample <- do.call(rbind, bsample)

  Ostats_bsample <- Ostats(traits = as.matrix(bsample[,'log_mass', drop = FALSE]),
                           sp = factor(bsample$taxonID),
                           plots = factor(bsample$siteID),
                           run_null_model = FALSE)

  # Extract only normalized overlap
  with(Ostats_bsample, data.frame(rep = i, data.frame(siteID = dimnames(overlaps_norm)[[1]], overlap_norm = overlaps_norm[,1])))
}

Run the bootstrap function 100 times, each time sampling the observed data with replacement within each community. Put the results together into a data frame.

n_boot <- 100
set.seed(1)

Ostats_boot <- lapply(1:n_boot, boot_fn)
Ostats_boot <- do.call(rbind, Ostats_boot)

Plot the results. In the plot, the solid line is the observed O-stat and the histogram is the bootstrap distribution.

ggplot(Ostats_boot, aes(x = overlap_norm, group = siteID, color = siteID, fill = siteID)) +
  geom_histogram(position = 'identity', alpha = 0.5, color = NA) +
  geom_vline(aes(xintercept = overlap_norm, color = siteID), data = Ostats_observed, linewidth = 1) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.01))) +
  scale_color_manual(values = unname(palette.colors(3))[-1], aesthetics = c('colour', 'fill')) + # Colorblind accessible
  theme_bw()

Calculate median 95% quantile interval of the bootstrap distribution of O-stats. Display alongside the observed.

Ostats_boot_summary_stats <- with(Ostats_boot, aggregate(overlap_norm ~ siteID, FUN = function(x) round(quantile(x, probs = c(.5, .025, .975)), 4)))

merge(Ostats_observed, Ostats_boot_summary_stats, by = 'siteID')

Bibliography

Bolnick, D.I., P. Amarasekare, M.S. Araujo, R. Burger, J.M. Levine, M. Novak, V.H.W. Rudolf, S.J. Schreiber, M.C. Urban, and D.A. Vasseur. 2011. Why intraspecific trait variation matters in ecology. Trends in Ecology and Evolution 26(4):183-192. https://doi.org/10.1016/j.tree.2011.01.009

Geange, S.W., S. Pledger, K.C. Burns, and J.S. Shima. 2011. A unified analysis of niche overlap incorporating data of different types. Methods in Ecology and Evolution 2(2):175-184. https://doi.org/10.1111/j.2041-210X.2010.00070.x

Grady, J.M., Q.D. Read, S. Record, P.L. Zarnetske, B. Baiser, K. Thorne, and J. Belmaker. 2018. Size, niches, and the latitudinal diversity gradient. Teaching Issues and Experiments in Ecology 14: Figure Set #1.

Maitner, B.S., A.H. Halbritter, R.J. Telford, T. Strydom, J. Chacon, C. Lamanna, L.L. Sloat, A.J. Kerkhoff, J. Messier, N. Rasmussen, F. Pomati, E. Merz, V. Vandvik, and B.J. Enquist. 2023. Bootstrapping outperforms community-weighted approaches for estimating the shapes of phenotypic distributions. Methods in Ecology and Evolution 14:2592-2610.https://doi.org/10.1111/2041-210X.14160

Mouillot, D., W. Stubbs, M. Faure, O. Dumay, J.A. Tomasini, J.B. Wilson, and T. Do Chi. 2005. Niche overlap estimated based on quantitative functional traits: A new family of non-parametric indices. Oecologia* 145(3):345-353. https://doi.org/10.1007/s00442-005-0151-z

Read, Q.D., J.M. Grady, P.L. Zarnetske, S. Record, B. Baiser, J. Belmaker, M.-N. Tuanmu, A. Strecker, L. Beaudrot, and K.M. Thibault. 2018. Among-species overlap in rodent body size distributions predicts species richness along a temperature gradient. Ecography 41(10):1718–1727. https://doi.org/10.1111/ecog.03641

Stuble, K.L., C.M. Patterson, M.A. Rodriguez-Cabal, R.R. Ribbons, R.R. Dunn, and N.J. Sanders. 2014. Ant-mediated seed dispersal in a warmed world. PeerJ 2:e286.

Violle, C., B.J. Enquist, B.J. McGill, L. Jiang, C.H. Albert, C. Hulshof, V. Jung, and J. Messier. 2012. The return of the variance: Intraspecific variability in community ecology. Trends in Ecology & Evolution 27(4):244–252. https://doi.org/10.1016/j.tree.2011.11.014



NEON-biodiversity/Ostats documentation built on Nov. 21, 2024, 4:01 a.m.