In reumandc/tsvr: Timescale-Specific Variance Ratio for Use in Community Ecology

\newcommand{\mean}{\operatorname{mean}} \newcommand{\var}{\operatorname{var}} \newcommand{\cov}{\operatorname{cov}} \newcommand{\cor}{\operatorname{cor}}

The tsvr package provides an implementation of a timescale-specific extension and generalization of the variance ratio of @Peterson_75. The variance ratio is used commonly in community ecology. The extension implemented in the tsvr package is described in detail by @Zhao_inprep. The tsvr package supports that paper and provides an implemetation of the tools developed there for anyone to use. The mathematical formulas for the variance ratio and extensions are detailed elsewhere [@Peterson_75; @Hallett_14; @Zhao_inprep]. The mathematics are also summarized here, but the main purpose of this vignette is to provide a decription of how to use the tsvr package.

Preparing data

\noindent A typical dataset for analysis using tsvr is an $N \times T$ matrix of nonnegative numeric values where rows correspond to species in a community (so the number of species is $N$) and columns correspond to evenly spaced times during which sampling was conducted (so the number of times sampling was conducted is $T$). Matrix entries may be densities, or percent cover values for plant species within a quadrat, or biomasses, or other measures of abundance of the species. For instance:

library(tsvr)
class(JRGdat)
names(JRGdat)
d<-t(as.matrix(JRGdat[,2:dim(JRGdat)[2]]))
dim(d)

Here d is a $26 \times 28$ matrix containing percent cover measurements for each year from $1983$ to $2010$ for 26 species which occurred in a 1 m$^2$ plot in the Jasper Ridge Biological Preserve serpentine grassland site. Species names are in the row names of JRGdat, which is an example dataset embedded in the tsvr package. Documentation for the dataset can be viewed via ?JRGdat. See [@Hallett_14] for details of the Jasper Ridge ecosystem and of these and other related data.

Standard implementations of Fourier transforms require time series consisting of measurements taken at evenly spaced times, with no missing data. The core functions provided in tsvr make these same assumptions and throw an error if data are missing. The user is left to decide on and implement a reasonable way of filling missing data, if data are missing. We have previously used the simple approach of replacing missing values in a time series by the median of the non-missing values in the time series [@Sheppard_16]. This approach, and other related simple procedures [@Sheppard_16], seem unlikely to artefactually produce significant synchrony, or coherence relationships with other variables, but rely on the percentage of missing data being fairly low and may obscure detection of synchrony or significant coherence relationships if too many data are missing. For applications which differ meaningfully from the prior work for which the tools of this package were developed [@Zhao_inprep], different ways of filling missing data may be more appropriate.

The timescale-specific variance ratio techniques which are the focus of this package use Fourier methods to decompose by timescale the classic variance ratio and related quantities. Detrending and variance standardization across time series, techniques which are often applied before doing Fourier analysis, may not be approriate except in cases for which it makes sense to calculate the classic variance ratio and related quantities after performing those techniques.

The classic variance ratio and related quantities

\noindent Let $x_i(t)$ be a measure of the population of species $i$ at time $t$, for $i=1,\ldots,N$ and $t=1,\ldots,T$. Define $x_{\text{tot}}(t)=\sum_{i=1}^N x_i(t)$ to be the total of all species populaton measures. Define $\text{CV}{\text{com}}^2$ to be the square of the coefficient of variation of $x{\text{tot}}(t)$ over time, i.e., $\var_t(x_{\text{tot}}(t))/(\mean_t(x_{\text{tot}}(t)))^2$, where $\var_t$ and $\mean_t$ represent variance and mean computed through time. This equals $\sum_{i,j} \cov_t(x_i(t),x_j(t))/(\mean_t(x_{\text{tot}}(t)))^2$, where $\cov_t$ is covariance through time. The abbreviation "com" in $\text{CV}{\text{com}}^2$ stands for "community" since $\text{CV}{\text{com}}^2$ is the squared coefficient of variation of the whole-community population. Define $\text{CV}{\text{com_ip}}^2$ to be the value of $\text{CV}{\text{com}}^2$ that would pertain if the population dynamics of different species were independent, so that $\cov_t(x_i(t),x_j(t))=0$ for all $i \neq j$. The abbreviation "ip" stands for "independent populations". Thus $\text{CV}{\text{com_ip}}^2 = \sum{i} \cov_t(x_i(t),x_i(t))/(\mean_t(x_{\text{tot}}(t)))^2 = \sum_{i} \var_t(x_i(t))/(\mean_t(x_{\text{tot}}(t)))^2$. The classic variance ratio is defined as $\varphi=\text{CV}{\text{com}}^2/\text{CV}{\text{com_ip}}^2$, so that $\text{CV}{\text{com}}^2=\varphi \text{CV}{\text{com_ip}}^2$. A variance ratio greater than $1$ suggests "synchronous" dynamics of the species comprising the community, so that community variability ($\text{CV}{\text{com}}^2$) is greater than it would be if populations were independent ($\text{CV}{\text{com_ip}}^2$). A variance ratio less than $1$ suggests "compensatory" dynamics of the species comprising the community (i.e., increases/decreases in some species are partly compensated for by decreases/increases in others), so that community variability is less than it would be if populations were independent.

The quantities $\text{CV}{\text{com}}^2$, $\text{CV}{\text{com_ip}}^2$ and $\varphi$ can be computed using the vreq_classic function in the tsvr package, we here do so for the dataset d of the previous section, which we will continue to use throughout this vignette:

res<-vreq_classic(d)
class(res)
names(res)
summary(res)
print(res)
all.equal(res$com,res$comnull*res$vr)

The vreq_classic S3 class, of which vreq_classic is the generator function, inherits from the generic S3 class vreq (generator function vreq) and the list class, and has the three slots com, comnull, and vr. These slots correspond to $\text{CV}{\text{com}}^2$, $\text{CV}{\text{com_ip}}^2$ and $\varphi$. Both the vreq and vreq_classic classes have print and summary methods and set_* and get_* methods where * represents any of the class slot names. See documentation for vreq, vreq_classic, vreq_methods, vreq_classic_methods, setget_methods for details. The "classic" in vreq_classic references the fact that this version of the variance ratio is the original version used in community ecology [@Peterson_75], and is probably still the most commonly used. Alternative versions have been proposed [@Loreau_08] - see the next section of this vignette.

The Loreau-de Mazancourt variance ratio and related quantities

\noindent Define $\text{CV}{\text{pop}}^2=\left(\sum_i \sqrt{\var_t (x_i(t))}\right)^2/(\mean_t(x{\text{tot}}(t)))^2$. Another version of the variance ratio has been proposed by Loreau and de Mazancourt: $\varphi_{\text{LdM}} = \text{CV}{\text{com}}^2/\text{CV}{\text{pop}}^2$. Thus $\text{CV}{\text{com}}^2 = \varphi{\text{LdM}} \text{CV}_{\text{pop}}^2$. See [@Loreau_08] for details. The tsvr package implements the Loreau-de Mazancourt approach:

res<-vreq_LdM(d)
class(res)
names(res)
summary(res)
print(res)
all.equal(res$com,res$comnull*res$vr)
all.equal(res$com,vreq_classic(d)$com)

The vreq_LdM S3 class, of which vreq_LdM is the generator function, inherits from the generic S3 class vreq and the list class, and has slots com, comnull, and vr. These slots correspond to $\text{CV}{\text{com}}^2$, $\text{CV}{\text{pop}}^2$ and $\varphi_{\text{LdM}}$. The class vreq_LdM has print and summary methods and set_* and get_* methods where * represents any of the class slot names. See documentation for vreq_LdM, vreq_LdM_methods, and setget_methods for details.

The timescale-specific classic variance ratio and related quantities

\noindent Next we describe a timescale-specific extension of the classic variance ratio, and its related quantities, that uses spectral methods, a set of standard statistical tools in ecology and other fields. The power spectrum of the time series $x_i(t)$, here denoted $s_{ii}(\sigma)$ and defined for timescales $\sigma=T/(T-1),T/(T-2),\ldots,T/2,T$, decomposes $\var_t(x_i(t))$ by timescale in that $s_{ii}(\sigma)$ is nonnegative, will tend to be larger for timescales on which $x_i(t)$ shows greater variation through time, and $\sum_\sigma s_{ii}(\sigma) = \var_t(x_i(t))$. Likewise, the cospectrum $s_{ij}(\sigma)$ decomposes $\cov_t(x_i(t),x_j(t))$ by timescale in a similar way, being larger for timescales on which the two time series predominantly covary.

We define a timescale-specific generalization of $\text{CV}{\text{com}}^2$ as $\text{CV}{\text{com}}^2(\sigma) = \sum_{i,j} s_{ij}(\sigma)/(\mean_t(x_{\text{tot}}(t)))^2$. It is straightforward to see that $\sum_\sigma \text{CV}{\text{com}}^2(\sigma) = \text{CV}{\text{com}}^2$, so $\text{CV}{\text{com}}^2(\sigma)$ decomposes $\text{CV}{\text{com}}^2$ by timescale. $\text{CV}{\text{com}}^2(\sigma)$ reveals to what extent variation on each timescale contributes to community variability. Likewise, $\text{CV}{\text{com_ip}}^2 = \sum_{i} \var_t(x_i(t))/(\mean_t(x_{\text{tot}}(t)))^2$ can be decomposed by timescale as $\text{CV}{\text{com_ip}}^2(\sigma) = \sum{i} s_{ii}(\sigma)/(\mean_t(x_{\text{tot}}(t)))^2$. Finally, we define a timescale-specific version of the classic variance ratio as the quotient of these two quantities, i.e., $\varphi_{ts}(\sigma)=\left( \sum_{i,j} s_{ij}(\sigma) \right)/\left( \sum_i s_{ii}(\sigma) \right)$, so that $\text{CV}{\text{com}}^2(\sigma) = \varphi{ts}(\sigma) \text{CV}_{\text{com_ip}}^2(\sigma)$. The timescale-specific variance ratio quantifies the extent to which species' oscillations are synchronous ($>1$) or compensatory ($<1$) on a timescale-by-timescale basis. For further details, see [@Zhao_inprep].

The quantities $\text{CV}{\text{com}}^2(\sigma)$, $\varphi{ts}(\sigma)$ and $\text{CV}_{\text{com_ip}}^2(\sigma)$ can be computed using the tsvr package:

res<-tsvreq_classic(d)
class(res)
names(res)
summary(res)
print(res)
all.equal(res$com,res$tsvr*res$comnull)
all.equal(sum(res$com),vreq_classic(d)$com)
all.equal(sum(res$comnull),vreq_classic(d)$comnull)

Here, ts is a vector of timescales ($T/(T-1),T/(T-2),\ldots,T/2,T$), and the other elements are vectors of the same length containing timescale-specific information. The element com contains $\text{CV}{\text{com}}^2(\sigma)$, comnull contains $\text{CV}{\text{com_ip}}^2(\sigma)$, and tsvr contains $\varphi_{ts}(\sigma)$. The tsvreq_classic S3 class inherits from the generic class tsvreq and from the list class.

The timescale-specific variance ratio, $\varphi_{ts}(\sigma)$, is not a decomposition of $\varphi_{ts}$, i.e., summing $\varphi_{ts}(\sigma)$ across timescales does not yield $\varphi$. However, defining $w(\sigma)=\sum_i s_{ii}(\sigma)/(\sum_i \var_t(x_i(t)))$, one can show $\sum_\sigma w(\sigma)=1$, so the $w(\sigma)$ are weights, and $\sum_\sigma w(\sigma) \varphi_{ts}(\sigma) = \varphi$. So $\varphi$ is a weighted average of the values of $\varphi_{ts}(\sigma)$ across timescales. The wts field of a tsvreq_classic object contains $w(\sigma)$.

sum(res$wts)
res2<-vreq_classic(d)
all.equal(sum(res$tsvr),res2$vr)
all.equal(sum(res$wts*res$tsvr),res2$vr)

The plot method for the tsvreq_classic class displays the various components as functions of timescale:

plot(res,filename="Tsvreq_classic_demo")
knitr::include_graphics("Tsvreq_classic_demo.pdf")

The plots are symmetric about the middle, because Fourier transforms of real-valued time series have this property. The middle timescale is associated with the Nyquist frequency. The gray shading is a reminder of the symmetry - one should typically interpret the left, un-grayed side of plots. Both symmetric sides are plotted because sums must be computed over all displayed timescales to equal the corresponding frequency-nonspecific analogues. Additionally, non-smoothed Fourier transforms are used so that sums will exactly equal frequency-nonspecific analogues. Unsmoothed Fourier spectra and cospectra are jagged, so peaks of plots should not be given undue interpretive weight unless smoothing or averaging over timescales (see below) or significance testing is performed.

The lack of a timescale-specific Loreau-de Mazancourt variance ratio

\noindent It is difficult to envision an analogous timescale-specific version of the Loreau-de Mazancourt approach. The quantity $\text{CV}{\text{pop}}^2=\left(\sum_i \sqrt{\var_t (x_i(t))}\right)^2/(\mean_t(x{\text{tot}}(t)))^2$ cannot be decomposed by replacing the variances in the numerator by power spectra because of the square root.

Aggregating the timescale-specific classic variance ratio and related quantities to timescale bands

\noindent If $\Omega$ is a set of timescales, and defining $\text{CV}{\text{com}}^2(\Omega)=\sum{\sigma \in \Omega} \text{CV}{\text{com}}^2(\sigma)$, $\text{CV}{\text{com_ip}}^2(\Omega)=\sum_{\sigma \in \Omega} \text{CV}{\text{com_ip}}^2(\sigma)$, and $\bar{\varphi}{ts}(\Omega)=\frac{\sum_{\sigma \in \Omega} \varphi_{ts}(\sigma) w(\sigma)}{\sum_{\sigma \in \Omega} w(\sigma)}$, it has been shown [@Zhao_inprep] that $\text{CV}{\text{com}}^2(\Omega) = \bar{\varphi}{ts}(\Omega) \text{CV}_{\text{com_ip}}^2(\Omega)$. Aggregating over timescales mitigates the jaggedness resulting from unsmoothed Fourier transforms (see above). The tsvr package provides tools for aggregating over any collection of timescales:

res<-tsvreq_classic(d)
aggresLong<-aggts(res,res$ts[res$ts>=4])
aggresShort<-aggts(res,res$ts[res$ts<4])
class(aggresLong)
names(aggresLong)
print(aggresLong)
print(aggresShort)

The aggts function is the generator function for the vreq_classic_ag class, which inherits from the vreq class and from list.

Note that the best way to specify the timescales over which to aggregate is by using conditions such as aggts(res,res$ts[res$ts<4]). If you type in timescales, e.g., aggts(res,c(1.03,1.07,1.12)), and the timescales do not match exactly with timescales in res$ts, they will be removed. No error or warning will be triggered unless there are no remaining timescales. The reason for this is, the code removes all timescales that are not among the canonical Fourier timescales less than $2$, the Nyquist timescale. Remaining timescales are then reflected about the Nyquist timescale to account for the symmetry of Fourier transforms. This setup makes it possible to specify, say, aggregation over timescales less than 4 by aggts(res,res$ts[res$ts<4]) instead of aggts(res,res$ts[res$ts<4 & res$ts>4/3]) (which gives the same results if run, but is an inconvenient format with which to specify timescales). In other words, only timescales greater than or equal to the Nyquist timescale (2) need be specified in the argument ts, and the symmetric timescales on the other side of the Nyquist timescale are included automatically. See also the documentation for aggts.

Acknowledgements

\noindent This material is based upon work supported by the National Science Foundation under grant numbers 1714195 and 1442595, by the James S McDonnell Foundation, and by a working group grant from the Long Term Ecological Research Network Communications Office of the National Center for Ecological Analysis and Synthesis. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of these funders. We thank all users of the package who have reported or will later report ways in which the package could be improved.