In ajrominger/ssadAssociation: A package in support of the manuscript ""

knitr::opts_chunk$set(echo = FALSE, cache = FALSE, message = FALSE, warning = FALSE)

# function for cross-cache lazy loading
findChunk <- function(key, otherDoc) {
    f <- file.path(paste(otherDoc, 'cache', sep = '_'), 'latex')

    allChunk <- list.files(f, pattern = '.RData')

    thisChunk <- gsub('\\.RData', '', grep(paste0(key, '_'), allChunk, value = TRUE))
    thisChunk <- file.path(f, thisChunk)

    return(thisChunk)
}

# function for cross-doc figure referencing
refOtherFig <- function(x, otherDoc, prefix = '') {

}

# load packages
pkgs <- readLines('RarePlusComMinus_supp_cache/latex/__packages')
for(p in pkgs) library(p, character.only = TRUE)

# load plotting defaults
lazyLoad(findChunk('plot_defaults', 'RarePlusComMinus_supp'))
figW <- figH <- 3.5
knitr::opts_chunk$set(fig.width = figW, fig.height = figH, fig.align = 'center')

# load threading defaults
lazyLoad(findChunk('threading', 'RarePlusComMinus_supp'))

\linenumbers

Abstract

There is growing interest in inferring species associations, and possibly even interactions, from spatially replicated abundance data. \newline \newline

Introduction

Species interactions are central to the evolution of biodiversity and the maintenance of that diversity in biological assemblages []. With the vigorous growth in popularity of network theory in ecology and evolution, these interactions are often cast as ecological networks. Not surprisingly, there is a rich research agenda in developing methods to quantify species interactions [] and constructing ecological networks of interactions []. Abundance or presence/absence data are among the most prevalent and accessible ecological data, and have consequentially been frequently leveraged [] to attempt to infer species interactions, beginning at least in the early 1900s [], and presently to attempt to infer ecological networks []. For simplicity I will refer to both abundance and presence/absence data as simply "abundance data" and focus on true abundance data with the assumed implication that methodological limitations in the case of true abundances will be even more confounding for presence/absence scenarious [].

Two salient insights from decades of debate about inferring ecological processes from abundance data are: 1) interactions in the strict sense cannot be inferred, but species associations might be recoverable and these associations are assumed to map onto a combination of species-species or species-environment interactions []; and 2) robust null models are necessary for any inference. Thus, a current widespread practice is to reconstruct association networks aided by null models used to differentiate spurious associations from purputed real associations.

Here I raise a new issue with the approach of inferring association networks from abundance data in combination with null models. The issue arises from the near universal [] spatial aggregation of individuals within species. Spatial aggregation can arise from a large number of processes, both driven by deterministic mechanisms or complete demographic drift, thus aggregation in itself is not evidence of species association The presence of spatial aggregation causes all currently employed null models to have overly high type I error rates, i.e., incorrectly detecting associations when there are none. The present work complements a recent uptick in critical and cautionary assessments of approaches seeking to infer ecological network from abundance data []. The combined message of these contributions is that abundance-derived inferences of species interactions, and even species associations, should be viewed cautiously until other data can be brought to bear on the actuality of real interactions/associations.

Also maybe there are methods not based in null models, so might need to modify to specifically be about null models

Association networks

The construction of association networks from abundance data has been reviewed extensively elsewhere []. Here I aim to provide a very brief sketch such that the subsequent inquiry can be understood.

Firstly, it is of note that abundance-derived association networks are prevalent in studies across the tree of life, from microbes to plants and animals, and across ecosystems globally []. While preferred inference methods differ by subfield, the basic principles are the same. Investigators interrogate spatially replicated inter-specific abundance data to search for correlations in the abundances of each species pair.

Simple normalized covariance (i.e. linear correlation) is one metric used to quantify association, especially in microbial ecology studies where the software SpARCC is commony used []. It should further be noted that in microbial ecology studies, log transformed relative read frequency is used in the place of abundance. Other common metrics used to quantify association do not follow the exact formula of normalied covariance, but are non-the-less very close. These include xyz. I will therefore use the term "correlation" for all these metrics.

When interpreting pairwise species abundance correlations as species associations, positive correlations are assumed to indicate either shared adaptive responses to environmental conditions, mutualism, or possibly commensalism []. Negative correlations are assumed to indicate either divergent adaptive responses to environmental conditions, competition, or possibly predation/parasatism []. A lack of correlation is assumed to indicate no interaction []. The cases of asymmetric interaction (commensalism and predation/parasatism) are well known to be more difficult to detect []. To detect these processes some researchers have employed more nuanced approaches beyond correlation, such as XYZ entropy [].

Metrics of association by themselves do not deliver association networks. To construct networks, researchers need to filter out spurious associations (e.g. correlations produced by chance alone) from associations that are unlikely to have been produced by chance alone. This is the task of null models.

Null models

Null models are designed to produce sampling distributions of each pairwise abundance correlation under the null assumption that no association exists []. As simple as the intent, achieving this design goal is challenging and has spurred decades of debate about how to best construct null models []. At the root of this debate are the joint realities that null models for association networks require Monte Carlo simulation approaches because parametric approaches are not available [], and that Type I and II errors are very sensitive to how researchers choose to constrain null models []. Monte Carlo null model algorithms take as input a matrix of abundances where each cell represents the abundance of a specific species (usually the column ID) at a specific site (usually the row ID). The algorithm then permutes abundances across species and sites, thus simulating the condition of no association between species at sites. The permutation can be carried out within well-defined constraints.

Constraints on null models are necessary because early work revealed that null models that allowed permuted matrices to deviate from observed patterns of site richness and species abundance produced excessive Type I errors []. The conventional wisdom that has since emerged is that both row and column marginals should be constrained in the simulated null matrices []. These constrains correspond to maintaining the total abundance or richness observed at each site, and the total abundance across sites observed for each species.

Spatial aggregation

The goal of null models is to produce randomized abundance patterns where all non-random pattern deriving from species associations or interactions has been stripped away. However, non-random patterns in abundance can arise from many processes, including neutrality, that are not driven by species interactions or associations. As such, deviations of abundance patterns from null models might not, by themselves, indicate true associations or interactions. One critical, and widely observed, property of spatially replicated, intra-species abundances is that they are not evenly distributed across space, a pattern often referred to as spatial clustering [@mcgill2003; @engen2008; @zillio2010; @harte2011; @connolly2017]. Spatial clustering can be accounted for by purely probabilistic processes from neutral birth-death-immigration [@kendall1949; @hubbell2001] to mechanistically agnostic statistical-mechanical properties of large assemblages [@harte2011]. There is thurough consensus that such spatial clustering takes a specific mathematical form: the negative binomial distribution []. This is the distribution that @kendall1949, and subsequently @engen2008, derived for a population undergoing demographic drift due to birth, death, and immigration---and without any regard to species-species or species-environment interactions. The negative binomial distribution also receives widespread empirical validation [@connolly2017].

Here I show how negative binomial spatial aggregation obscures detection of species associations. I do use using a combination of reanalysis of exemplar datasets from microbes to animals and plants, in silico experiments showing how spatial aggregation, driven solely by demographic drift, produces spurious association patterns, and a mathematical investigation of why this happens.

Methods

Reanalysis of exemplar datasets

I reanalyze several exemplar datasets previously used to infer species associations for two reasons: 1) to re-affirm the pervasiveness of negative binomial spatial aggregation within species; and 2) to show that these negative binomially distributed spatial abundance distributions are sufficient by themselves (i.e. with no real species associations) to reproduce published inferences of species associations.

The datasets used are:

DS-1: Calatayud, J., Andivia, E., Escudero, A., Meláin, C.J., Bernardo-Madrid,R., Stoffel, M.et al. (2020). Positive associations among rare species andtheir persistence in ecological assemblages.Nat. Ecol. Evol.,4,40–45.

DS-2: ocean microbes

DS-3: soil microbes

Possible data: - Berry, D. & Widder, S. (2014). Deciphering microbial interactions anddetecting keystone species with co-occurrence networks.Frontiers inMicrobiology, 5, 219, 1-14. - Cardillo, M. & Meijaard, E. (2010). Phylogeny and co-occurrence ofmammal species on Southeast Asian islands.Global Ecol. Biogeog. 19,465-474. - Kay, G.M., Tulloch, A., Barton, P.S., Cunningham, S.A., Driscoll,D.A. & Lindenmayer, D.B. (2017). Species co-occurrence networksshow reptile community reorganization under agriculturaltransformation.Ecography, 41, 113-125 - Mandakovic, D., Rojas, C., Maldonado, J., Latorre, M., Travisany, D.,Delage, E. (2018). Structure and co-occurrence patterns in microbialcommunities under acute environmental stress reveal ecological factorsfostering resilience.Scientific Reports, 8. 5875, 1-12 - Steele, J.A., Countway, P.D., Xia, L., Vigil, P.D., Beman, J.M., Kim,D.Y.et al. (2011). Marine bacterial, archaeal and protistan associationnetworks reveal ecological linkages.ISME J., 5, 1414–1425 - Losapio, G., Schöb, C., Staniczenko, P. P., Carrara, F., Palamara, G. M., De Moraes, C. M., ... & Bascompte, J. (2021). Network motifs involving both competition and facilitation predict biodiversity in alpine plant communities. Proceedings of the National Academy of Sciences, 118(6), e2005759118.

In silico investigation of spatial aggregation, demographic drift, and spurious associations

Make a spatial neutral model - source pool - linked local comms - should it be rosindell's coal method?

Analyze for networks

Mathematical investigation of spatial aggregation causing spurious associations

Data and Code Availability

All data and code needed to reproduce the results of this manuscript are available at https://github.com/ajrominger/ssadAssociation and a detailed description of the analytically approach is available in the supplement.

\clearpage

Bib notes

Brazeau, H.A. & Schamp, B.S. (2019). Examining the link betweencompetition and negative co-occurrence patterns.Oikos, 128, 1358-1366.
Cazelles, K., Araujo, M.B., Mouquet, N. & Gravel, D. (2016). A theory forspecies co-occurrence in interaction networks.Theor. Ecol.,9,39–48
Connor, E.F., Collins, M.D. & Simberloff, D. (2013). The checkeredhistory of checkerboard distributions.Ecology, 94, 2403–2414.
Connor, E.F. & Simberloff, D. (1979). The assembly of speciescommunities: Chance or competition?Ecology, 60, 1132.
D’Amen, M., Mod, H.K., Gotelli, N.J. & Guisan, A. (2018). Disentanglingbiotic interactions, environmental filters, and dispersal limitation asdrivers of species co-occurrence.Ecography, 41, 1233–1244
Diamond, J.M. (1975). Assembly of species communities. In:Ecology andEvolution of Communities(eds Cody, M.L. & Diamond, J.M.). HarvardUniv Press, Cambridge, Mass, pp. 342–444
Faisal, A., Dondelinger, F., Husmeier, D. & Beale, C.M. (2010). Inferringspecies interaction networks from species abundance data: Acomparative evaluation of various statistical and machine learningmethods.Ecological Informatics, 5, 451–464
Faust, K. & Raes, J. (2012). Microbial interactions: From networks tomodels.Nat. Rev. Microbiol., 10, 538–550
Freilich, M.A., Wieters, E., Broitman, B.R., Marquet, P.A. & Navarrete,S.A. (2018). Species co-occurrence networks: Can they reveal trophicand non-trophic interactions in ecological communities?Ecology, 99,690–699
Gotelli, N.J., Graves, G.R. & Rahbek, C. (2010). Macroecological signalsof species interactions in the Danish avifauna.Proc. Natl Acad. Sci.,107, 5030–5035.
Gotelli, N.J. & Ulrich, W. (2010). The empirical Bayes approach as a toolto identify non-random species associations.Oecologia, 162, 463–477.
Harris, D.J. (2016). Inferring species interactions from co-occurrence datawith Markov networks.Ecology, 97, 3308–3314
Morueta-Holme, N., Blonder, B., Sandel, B., McGill, B.J., Peet, R.K.,Ott, J.E.et al. (2016). A network approach for inferring speciesassociations from co-occurrence data.Ecography, 39, 1139–1150.
Ovaskainen, O., Abrego, N., Halme, P. & Dunson, D. (2016). Usinglatent variable models to identify large networks of species-to-speciesassociations at different spatial scales.Methods Ecol. Evol., 7, 549–555.
Ovaskainen, O., Hottola, J. & Siitonen, J. (2010). Modeling species co-occurrence by multivariate logistic regression generates new hypotheseson fungal interactions.Ecology, 91, 2514–2521.
Ovaskainen, O., Tikhonov, G., Norberg, A., Guillaume Blanchet, F.,Duan, L., Dunson, D.et al. (2017). How to make more out ofcommunity data? A conceptual framework and its implementation asmodels and software.Ecol. Lett., 20, 561–576
Popovic, G.C., Warton, D.I., Thomson, F.J., Hui, F.K.C. & Moles, A.T.(2019). Untangling direct species associations from indirect mediatorspecies effects with graphical models.Methods Ecol. Evol., 10, 1571–1583
Thurman, L.L., Barner, A.K., Garcia, T.S. & Chestnut, T. (2019). Testingthe link between species interactions and species co-occurrence in atrophic network.Ecography, 42, 1658–1670

References

ajrominger/ssadAssociation documentation built on Sept. 29, 2023, 4:43 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ajrominger/ssadAssociation
A package in support of the manuscript ""

In ajrominger/ssadAssociation: A package in support of the manuscript ""

Abstract

Introduction

Association networks

Null models

Spatial aggregation

Methods

Reanalysis of exemplar datasets

In silico investigation of spatial aggregation, demographic drift, and spurious associations

Mathematical investigation of spatial aggregation causing spurious associations

Data and Code Availability

Bib notes

References

R Package Documentation

Browse R Packages

We want your feedback!

ajrominger/ssadAssociation A package in support of the manuscript ""

In ajrominger/ssadAssociation: A package in support of the manuscript ""

Abstract

Introduction

Association networks

Null models

Spatial aggregation

Methods

Reanalysis of exemplar datasets

In silico investigation of spatial aggregation, demographic drift, and spurious associations

Mathematical investigation of spatial aggregation causing spurious associations

Data and Code Availability

Bib notes

References

R Package Documentation

Browse R Packages

We want your feedback!

ajrominger/ssadAssociation
A package in support of the manuscript ""