knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)

library(tidyverse)
library(ggtern)
library(ggpubr)
library(reshape2)
rastPlot <- function(data, title = "title", xlabel = "x", ylabel = "y"){
  ggplot(melt(data), aes(x = Var1, y = Var2, fill = value)) +
    geom_tile() +
    theme(axis.text = element_blank(),
          legend.title = element_blank()) +
    scale_fill_viridis_c() +
    labs(title = title,
         x = xlabel,
         y = ylabel) +
    theme_bw()
}

Description of Model and Simulation Scenarios

Patch locations and environmental variables

In our model, the metacommunity consists of $N$ patches distributed over a spatially heterogeneous landscape, with multiple environmental variables(although, the current simulations only have one environmental variable $D$ and 1000 patches) that could either be randomly distributed or spatially autocorrelated. Each patch has a set of coordinates in a two-dimensional space, and all possible coordinates are feasible such that this is a continuous space model that is not restricted to a lattice or some other kind of regular spatial arrangement of spatial units.

A patch may be empty, or be occupied by a single or by several species. We define $X_{i,z,t}$ as a stochastic variable representing the occurrence of species $i$ at location $z$ and time $t$. Occurrence, $X_{i,z,t}$, takes a value of 1 when species $i$ is present and a value of 0 when it is absent. Similarly, we define $\mathbf{Y}{z,t}=X{1,z,t},X_{2,z,t},…,X_{R,z,t}$ as a vector containing the presence-absence of each species from the regional pool $R$.
The model only tracks the occupancy of the patches (not population densities). Spatial dynamics occurs because of colonization events, in both empty patches and patches that are occupied by other species, and because of extinction events. The emerging species co-distributions are a result of a dynamic balance between these events. Ecological interactions can impact either or both the colonization and the extinction probabilities. For instance, the presence of a competitor pre-empting a patch can reduce the colonization probability by another competitor. Alternatively, the presence of a competitor in a patch could increase the extinction probability of another species. Similarly, the environment could influence both the colonization and the extinction probabilities.

Patch Colonization

We consider a discrete-time Markovian process to represent the dynamics of presence- absence of all species and we incorporate the effect of dispersal, environmental filtering and ecological interactions in such a way that we could cover all possible scenarios wherein species differ in any combination of ways. We include interspecific competition along with other types of spatial dynamics such as predator-prey interactions (Gravel et al. 2011), priority effects (Shurin et al. 2004), or mutualistic interactions (e.g. Gilarranz et al. 2015). Following a colonization event from time $t$ to $t + \Delta$ corresponds to: $$P(X_{i,z,t+\Delta t}=1│X_{i,z,t}=0)=I_{i,z,t} S_{i,z,t} C_{i,z,t}$$ where $I_{i,z,t}$ is the number of immigrants of species $i$ reaching patch $z$ at time $t$, $S_{i,z,t}$ is the effect of environmental filtering on the probability of establishing a viable local population and $C_{i,z,t}$ is the effect of ecological interactions on the establishment probability. We note that because we represent a stochastic process, the product of these three functions has to be bounded between 0 and 1. We consequently define these quantities:

The effect of immigration is given by:
$$I_{i,z,t} = \frac{\sum k(z,\omega)X_{i,\omega,t}}{\sum k(z,\omega)} $$

which is a weighted average of the occurrence probability of species $i$ in the neighborhood of $z$. The function $k(z,\omega)$ is a dispersal kernel that depends on the location of patch $z$ and the neighborhood $\omega$. For convenience, we consider an exponential function of the Euclidean distance between localities. We add to the kernel a low distance and neighborhood-independent constant $m$ in order to account from immigration from outside the simulated metacommunity. This assumption is required to prevent total extinction by drift under perfectly neutral dynamics.

The effect of the environment is given by a product of the establishment performance over all environmental variables $E_n$: $$S_{i,z,t} = \prod f(E_{n,z}\, \mu_{i,n}\, \sigma_{i,n})$$ In our simulations, for convenience, we consider that the function $f$ has a quadratic form for all species and all environmental variables, though the model is flexible enough to consider other responses that also differ among species.

Ecological interactions on establishment probability

To incorporate all possible ecological interactions, we start by representing the interaction network by a community matrix $\mathbf{A}$ of $R$ species. The elements $\alpha_{ij}$ of $\mathbf{A}$ quantify the effect of species $j$ on the dynamics of species $i$. When $\alpha_{ij}$ is negative, the colonization probability of species $i$ decreases and/or its extinction probability increases when $j$ is found locally. Inversely, when $\alpha_{ij}$ is positive, the colonization probability increases and/or the extinction probability decreases. To account for the cumulative effects of local interactions on transition probabilities, we make colonization and extinction probabilities community dependent. As explained above, at a time $t$, the $\mathbf{Y}{z,t}$ vector gives the local assemblages. We calculate the sum of interactions at any time and for each species as $\nu=\mathbf{A}{z,t}\mathbf{Y}{z,t}$. Our approach can be interpreted as a spatial analogue to the generalized Lotka–Volterra model because it takes into account the impact of the whole network of interactions on each species dynamics and can deal with any type of interaction. We now define the function $$C{i,z,t}=g(\nu_{i},z,t)$$ representing the total effect of ecological interactions on the colonization probability. For convenience, we will use a sigmoid function, with $g$ ranging between $c_{min}$ at high negative interactions and $c_{max}$ at high positive interactions, where $c_{max}$ should be interpreted as the maximal colonization probability when the environmental conditions are optimal and there are no dispersal limitations.

Patch Extinction

The definition of the extinction probability follows exactly the same rules as for colonization, except that extinction is independent of the neighborhood composition. We follow the same logic to define the effect of ecological interactions and of variation in the environment. Consequently, we get the Markovian process: $$P\left(X_{i,z,{t+\Delta t}}=1│X_{i,z,t}=0\right)=M_{i,z,t} E_{i,z,t}$$

where $M_{i,z,t}$ and $E_{i,z,t}$ are the responses of the extinction probability to the local environment and to ecological interactions, respectively. The difference with the colonization functions defined in the previous section is that the extinction probability must be larger when interactions are negative and smaller when they are positive. In addition, the extinction rate should be minimal (instead of maximal) at environmental optimum.

Interpretation

To interpret the model, note that, at steady state, for each species, we obtain the expected occurrence probability $( \hat{P})$ at each site as: $$\frac{\hat{P}^{iz}}{1-\hat{P}^{iz}} = \frac{I^{iz} \cdot S^{iz} \cdot C^{iz}}{M^{iz} \cdot E^{iz}}$$ After a log transformation, this yields:

$$\log \left( \frac{\hat{P}^{iz}}{1-\hat{P}^{iz}}\right) = \log\left(I^{iz}\right) + \log\left(\frac{S^{iz}}{M^{iz}}\right) + \log\left(\frac{C^{iz}}{E^{iz}}\right)$$

The last equation can be interpreted as a macroscopic description of the expected species distribution pattern (Thuiller et al. 2013). In this formulation, $\log(I)$ describes the tendency of a patch to resemble other nearby patches due to the spatial contagion by dispersal, $\log\left(\frac{S}{M}\right)$ describes the tendency of sites to be occupied by species with similar fitness responses to environmental gradients, and $\log\left(\frac{C}{E}\right)$ describes the remaining influence of other species on co-occurrence due to interactions among species. The values for these indices will depend on what choices are made for the components of eq. 1 (see Supporting Information for details on how we implemented this simulations model).

This modeling framework can represent the classical archetypes (Box 1), but also permits more intricate (and likely far more realistic) metacommunity scenarios and predictions. For example, we could use the model to examine how species traits (and environmental context) link to metacommunity dynamics. Moreover, continuous mixtures of different metacommunity extremes (archetypes) can be represented by appropriate choices for dispersal, competitive abilities, and environmental preferences. For instance, species sorting would require a relatively large colonization to extinction ratio along with species-specific environmental requirements and regional similarity (sensu Mouquet and Loreau, 2002). Alternatively, coexistence within competition-colonization trade-offs requires species to have similar responses to the environment and appropriate heterogeneities in the $I$, $C$ and $E$ functions, but no environmental preferences.

The implemented mechanisms in the simulation model can be partly mapped onto variation partitioning components. For instance, at equilibrium, we could expect dispersal limitation (the $\log(I)$ term in equation 3) to create positive spatial autocorrelation at the dispersal scale (the [S/E] fraction in variation partitioning). Environmental filtering (the $\log\left(\frac{S}{M}\right)$ term in equation 3) should lead to a correlation between composition and environment (the [E/S] fraction in variation partitioning). The last term in equation 3, however, describing the effect of interactions on distribution (the $\log\left(\frac{C}{E}\right)$), is novel and has no equivalent in the context of classical variation partitioning.

Parameterization and Simulation Scenarios

We simulated metacommunity dynamics with a landscape of 1000 patches over 200 time steps and an initial occupancy of 0.8. Patches were placed randomly in a two-dimensional plane with coordinates drawn from a uniform distribution with a minimum of 0 and a maximum of 1. The environment varied spatially, with values drawn from a random distribution between 0 and 1. In the specific simulations we studied in the paper, colonization was the only component of the species that were affected by the environment (i.e. $E_{i,z,t} = 1$). Specifically, colonization reacted to the environment following a quadratic curve.

For all scenarios considered, we simulated 12 species. Niche optimums for the species were evenly distributed between 0.1 and 0.9 while niche breadth was set to 0.8 for simulations with narrow niches (scenarios A, B, E, F, G), and to 2 for simulations where niche was assumed to be broad (scenarios C and D). For dispersal, we considered an exponential dispersal kernel, with a distance-independent immigration probability of 0.001 and an $\alpha$ parameter of 0.05. For scenario G, where we have variable dispersal kernels (Figure 3), $\alpha$ was 0.01 for 1/3 of species, 0.05 for 1/3 of species, and 0.1 for the other 1/3 of species. We used a sigmoid function to relate the total number of interactions with colonization and extinction coefficients following the implementation by Cazelles et al. (2016). Colonization probability in the absence of interactions was set at 0.4, which tends to zero as negative interactions tend to infinity, while it asymptotes at a 1 with infinite positive interactions. All other aspects of the colonization-interaction curve were the same for all scenarios. Similarly, extinction in the absence of interactions was set at 0.025, and tended to 1 with infinite negative interactions, while its asymptote tended to 0 with infinite positive interactions. In both cases, the parameter setting the shape of the sigmoid function was set to 0 for the scenarios without competition (scenarios A, B, and F), and 1.5 in the presence of competition (scenarios C, D, E, G) . If there were interactions, then a focal species only interacted with the two species that had the closest niches. For all scenarios, five sets of metacommunities were simulated and analyzed to obtain the results found in Figures 2, 3, and 4 in the main text, and supplementary figures.

All scenarios have been implemented in R and the project’s repository can be found here: github.com/javirudolph/testingHMSC

Supplementary Figure 2. Species interactions for all scenarios.

knitr::include_graphics(
  path = "manuscript_outputs/ tiff_files/Sup_F2_interactions.tiff",
  dpi = 600,
  error = TRUE
)

Supplementary Figure 3. In which we have half of the species with interactions and the other without.

knitr::include_graphics(
  path = "manuscript_outputs/ tiff_files/Sup_F3_scenE.tiff",
  dpi = 600,
  error = TRUE
)

Supplementary Figure 4. In which we change dispersal only, $\alpha$ was 0.01 for 1/3 of species, 0.05 for 1/3 of species, and 0.1 for the other 1/3 of species.

knitr::include_graphics(
  path = "manuscript_outputs/ tiff_files/Sup_F4_ScenF.tiff",
  dpi = 600,
  error = TRUE
)

Description of the Statistical Framework

Hierarchical Community Models

In their simplest form, Hierarchical Community Models (HCMs) resemble standard species distribution models that regress species presences/absences against environmental predictors. However, to reduce model complexity, HCMs assume that all species in a metacommunity will react to environmental heterogeneity following a similar response function (e.g., linear vs quadratic or Gaussian). To account for spatial autocorrelation, either spatial variables such as Moran’s eigenvectors maps (MEM, Dray et al. 2006) or spatially auto-correlated latent variables (Ovaskainen et al. 2016b) can be incorporated to the model. To account for biotic interactions, non-spatially auto-correlated latent variables are used. If we use a linear specification approach (here, this can also include quadratic terms that capture Gaussian responses to environment as imposed in our model), we can write:

$$\mathbf{L}^{zi}=\mathbf{X}^{zk} \mathbf{B}^{ki}+ \mathbf{\epsilon}^{zi}$$

with

$$\mathbf{B}^{ki} \tilde{} N(\mathbf{\mu},\mathbf{\Sigma})$$

where $\mathbf{L}^{zi}$ is the presence (or absence) of species $i$ (out of $m$ species) at patch $z$ (out of $n$ patches), $\mathbf{X}^{zk}$ is the value of the environmental variable $k$ (out of $p$ variables) at site $z$, $\mathbf{B}$ is a matrix of regression parameters, $\mathbf{\mu}$ is a vector of length $p$ that describes the mean environmental response of all species, $\Sigma$ is a $p \times p$ covariance matrix that describes how species vary (diagonal) and co-vary (off-diagonal) around the mean environmental response (Ovaskainen and Soininen 2011), and $\mathbf{\epsilon}^{zi}$ is a residual value. Estimating species parameters hierarchically around a community mean reduces the degrees of freedom and makes the model easier to fit with limited data. Both $\mathbf{\mu}$ and $\Sigma$ can be further informed or constrained by species traits or phylogeny if desired (Ovaskainen et al. 2017). To account for biotic interactions, we consider latent variables $\mathbf{H}^{zl}$ (where $l$ refers to a latent variable measured at site $z$) and their associated parameters $\mathbf{\Lambda}^{li}$ (Ovaskainen et al. 2016a). This yields:

$$\mathbf{L}^{zi}=\mathbf{X}^{zk} \mathbf{B}^{ki} + \mathbf{H}^{zl}\mathbf{\Lambda}^{li} + \mathbf{\epsilon}^{zi}$$

Note that it is not necessary to always include all of these components in one model; they can be considered in any combination deemed relevant for a particular question. In this paper, we used Moran’s Eigenvector Maps (MEMs; Dray et al. 2006), a powerful and commonly used method to account for space in statistical models involving species distributions. Note that we could have also used autocorrelated latent variables (Ovaskainen et al. 2016b) to account for the space into the model.
Calculating Variation Partitioning for the HCM As in any generalized linear mixed effect model, we can now partition the explained variation into different components, notably environmental heterogeneity, co-distribution (biotic interactions), space and unexplained variation (Figure S1). To estimate the unique contributions of each of these four fractions for each species, we calculated semi-partial coefficients of determination (i.e., based on Type III sum-of-squares as specified in Peres-Neto et al. 2006) using the implementation suggested by Tjur (2009) as being more appropriate for presence-absence data (i.e., logit link) than the traditional variation partitioning based on an identity link. To adjust for the number of variables used to quantify each fraction of the variation partitioning analysis, we applied the adjustment to the coefficient of determination proposed by Gelman and Pardoe (2006) in the variation partitioning analysis, which is designed for hierarchical models. As shown in Figure A1, the different partitions were combined so that a unique value was associated to environment (fractions [a], [d]/2, [f] and [g]/2), co-distribution (fraction [c]), space (fractions [b], [d]/2, [e] and [g]/2) and the unexplained portion of the variation (fraction [h]). Latent variables are quite powerful to isolate structure in the data. As such, in the calculation of the variation partitioning, latent variables will capture almost all variation associated to the environment and space, giving an artificial inflation of the overlapping partitions between co-distribution and environment and co-distribution and space. For this reason, all partitions overlapping with co-distribution (fractions [e], [f] and [g]) were associated to either environment (fractions [f] and [g]) or space (factions [e] and [g]). In this calculation, a unique measure of explained variation (in adjusted R2) is associated to co-distribution (fraction [c]) but this is not the case for environment and space. To associate a unique value to environment and space, and represent the results as we did in Figure 2 and 3 we divided the fractions overlapping environment and space between these two components. As such, the sum of fractions [a], [f], half of fraction [d] and half of fraction [g] were used to measure the effect of the environment while fractions was considered [b], [e], half of fraction [d] and half of fraction [g] were used to measure the effect of space.

Supplementary Figure 1.

knitr::include_graphics(path = "figA1.png")

Calculation of the coefficient of determination

Classic coefficient of determination

The coefficient of determination, $R^2$, that was partitioned in the variation partitioning analysis (Appendix XX) is calculated for species $j$ as

$$R_j^2= 1- \frac{\sum_{i=1}^{n}(y_{ij}-\hat{y}{ij})^2}{\sum{i=1}^{n}(y_{ij}-\bar{y}{ij})^2)}$$ where $y_ij$ is the data associated with species $j$ (out of $p$ species) at site $i$ (out of $n$ sites), $\hat{y}{ij}$ is the model associated to species $j$ at site $i$ and $\bar{y}_j$ is the average of the data for species $j$ across all sites.

Community-level coefficient of determination

Although having an $R_j^2$ for each species $j$ can be highly informative, it is sometimes useful to also have a single $R^2$ for the entire community. This is obtained by averaging all $R_j^2$

$${}^CR^2 = \frac{\sum_{j=1}^{p} R_j^2}{p}$$ where the ${}^CR^2$ is the community-level $R^2$.

Site contribution to the coefficient of determination

In the paper, we use the contribution of each site to $R_j^2$ to present how each site contributes differently to the environment, space and co-distribution for the community. The calculation of the site $i$ contribution to the $R_j^2$, is calculated as $$R_{ij}^2=\frac{1}{n}-\frac{(y_{ij}-\hat{y}{ij})^2}{(y{ij}-\bar{y}{ij})^2}$$ When studying equation A12, a few considerations can be made. The first part of the equation where the 1 of the classic $R_j^2$ is divided by $n$ is included to make sure that if we sum all $R{ij}^2$ across all sites for species $j$, the resulting value is $R_j^2$ and not some other value.

More importantly, what can be noticed is that by calculating $R_{ij}^2$, the contribution of sites to each species $R_j^2$, we obtain a matrix that has the same dimension as the site by species matrix (an $n×p$ matrix). Using this matrix, if we sum across all sites, we obtain the $R_j^2$. However, if we average across the species for site $i$ we obtain the site’s contribution to the community-level ${}^CR^2$ or ${}^CR_i^2$.

The amount of variation expressed by the $R_j^2$, the ${}^CR^2$ , the $R_{ij}^2$ or the ${}^CR_i^2$ can all be partitioned in its environmental, spatial and co-distribution component following the procedure presented in (Appendix XX).

Pseudo coefficient of determination

There exist different variants of $\text{pseudo-}R^2$ that can be applied to specific types of data (e.g. Tjur’s D (Tjur 2009) is designed to presence-absence data) or more broadly (e.g. Efron’s $\text{pseudo-}R^2$ (Efron 1978)). Because all the different variants of $\text{pseudo-}R^2$ can be applied to a model with a single response variable (i.e. a single species), community-level $\text{pseudo-}R^2$ can be obtained following the approach mentioned above. However, this is not always possible when the interest is to calculate the sites’ contribution to a $\text{pseudo-}R^2$, especially for variants of $\text{pseudo-}R^2$ that do not have a mathematical structure similar to the classical $R^2$.

Example of metacommunity simulation functions

The following example of code shows the overall processes involved in the metacommunity simulation for our model. This example shows the scenario for 20 patches and one environmental variable. The model gives the option for a random or spatially aggregated structure for the patches. In the aggregated case, we determined four clusters, denoted by Nclusters in the code below. The value of the environmental variable for each patch is shown with the color hue. In this case, the environmental variable is randomly distributed.

set.seed(227)
# Random XY coordinates
# Each coordinate is drawn from a random uniform distribution
get_XY = function(N) cbind(runif(N),runif(N))

# Aggregation of XY coordinates
get_XY_agg = function(N, Nclusters, sd_xy) {

  Xclust = runif(Nclusters)
  Yclust = runif(Nclusters)

  X = rnorm(N, rep(Xclust,N/Nclusters), sd_xy)
  Y = rnorm(N, rep(Yclust,N/Nclusters), sd_xy)

  cbind(X,Y)
}

# Random uniform environmental values
get_E = function(D, N) matrix(runif(D*N), nr = N, nc = D)

Here, we set N, the number of patches to 20, and D, the number of environmental variables to one.

N <- 20
D <- 1

rXY <- get_XY(N)
agXY <- get_XY_agg(N, 4, 0.02)
E <- get_E(D = D, N = N)

The following figure shows a side by side comparisson between random and aggregated patches obtained from our functions.

plotData <- cbind.data.frame(rXY, agXY, E)
names(plotData) <- c("rX","rY", "agX", "agY", "E")

plot_rXY <- ggplot(plotData) +
  geom_point(aes(x = rX, y = rY, color = E), cex = 3) +
  labs(title = "Random patch locations",
       y = "Y coord",
       x = "X coord") +
  theme_classic()

plot_agXY <- ggplot(plotData) +
  geom_point(aes(x = agX, y = agY, color = E), cex = 3) +
  labs(title = "Aggregated patch locations",
       y = "Y coord",
       x = "X coord") +
  theme_classic()

ggarrange(plot_rXY, plot_agXY, legend = "right", common.legend = TRUE)

Initial Occupancy

The toy model allows for setitng the initial occupancy in the metacommunity. For example, to create the initial conditions, $t=0$, of presence absence, species occupancy is drawn from a random uniform distribution, and values smaller than $0.8$ are considered as species presence. Patches or locations $z$ are represented by rows in the matrix, whereas each species is a column. Each cell in the matrix is $X_{i,z,t}$ and each row is $\mathbf{Y}_{z,t}$ for $t=0$. The following figure shows three different iterations of this process, where areas in black represent occupancy = 1, and white denotes an absence.

#Get your initial conditions:
R <- 8
Y0 = matrix(0, nrow = N, ncol = R)
rand = matrix(runif(N*R), nr = N, nc = R)
Y0[rand < 0.8] = 1

ggplot(melt(Y0), aes(x = Var1, y = Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "black")+
  labs(x = "Plot number", y = "Species", subtitle = "Iteration 1") +
  theme_bw() -> A

# Second iteration for the same initial conditions
Y0 = matrix(0, nrow = N, ncol = R)
rand = matrix(runif(N*R), nr = N, nc = R)
Y0[rand < 0.8] = 1

ggplot(melt(Y0), aes(x = Var1, y = Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "black")+
  labs(x = "Plot number", y = "" , subtitle = "Iteration 2") +
  theme_bw() -> B

# Third iteration
Y0 = matrix(0, nrow = N, ncol = R)
rand = matrix(runif(N*R), nr = N, nc = R)
Y0[rand < 0.8] = 1

ggplot(melt(Y0), aes(x = Var1, y = Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "black")+
  labs(x = "Plot number", y = "", subtitle = "Iteration 3") +
  theme_bw() -> C

# Set the three iterations to be displayed side by side
ggarrange(A, B, C, ncol = 3, common.legend = TRUE, legend = "none")

When we consider the immigration component, we need to also consider the connectivity matrix. In the code below, I_f calculates the probabilty of immigration for each species, based on the occupancy matrix and the dispersal kernel, K. The argument K is the connectivity matrix. The argument XY corresponds to the patch coordinates, whereas alpha is the dispersal parameter associated to the exponential distribution used for dispersal. It can be computed with the following function:

# Compute the propagule pressure
I_f = function(Y, K, m) I = (1-m)*(K%*%Y)/(K%*%matrix(1,nr=N,nc=R)) + m

The arguments for this function are: Y, K, m. We calculated $\mathbb{Y}$, species presence or absence, in the previous section with the case for initial conditions. Argument m is set in the parameters as a value m = 0.001 and the connectivity matrix K is calculated below.

# Compute the connectivity matrix
get_K = function(XY, alpha) {
    N = nrow(XY)
    distMat = as.matrix(dist(XY, method = "euclidean", upper = T, diag = T))
    ConMat = exp(-1/alpha*distMat)
    diag(ConMat) = 0
    return(ConMat)
}

As an example, using the aggregated XY coordinates for r N patches and our initial occupancy matrix with r R species ,we can see the connectivity between patches, and can calculate the contribution of immigration from each species to each patch.

# We can use the aggregated XY coordinates for this example:
XY <- agXY

# Connectivity matrix
alpha <- 0.005
K <- get_K(XY, alpha)
plot_K <- cbind(rownames(K), stack(as.data.frame(K)))
names(plot_K) <- c("X", "Fill", "Y")
plot_K %>% 
  as_data_frame() %>% 
  mutate(X = factor(X, levels = c(1:20)),
         Y = factor(Y, levels = c(1:20))) -> plot_K
ggplot(plot_K, aes(x = X, y = Y, fill = Fill)) +
  geom_raster() +
  scale_fill_gradient(low = "black", high = "white") +
  #scale_fill_viridis_c() + 
  labs(title = "Connectivity among patches") +
  theme(legend.title = element_blank(),
        axis.text.x = element_text(angle = 90)) -> A
# Immigration
m <- 0.001
Y <- Y0
I <- I_f(Y, K, m)
rastPlot(I, title = "I - Immigration to each plot", x = "Patches", y = "Species") +
  theme(legend.title = element_blank()) -> plot_I
ggarrange(A, plot_I, ncol = 2)

The effect of the environment on each species, depending on each species niche optima, is computed in the following code section. The argument E corresponds to the vector of values for the environmental variable in each patch. The other two arguments in this function are the niche optima (u_s) for each species and niche breadth(s_c).

# Compute the local performance of propagules
S_f_quadratic <- function(E, u_c, s_c) {
  R <- ncol(u_c)
  N <- nrow(E)
  D <- ncol(E)
  S <- matrix(1, nr = N, nc = R)
  for(i in 1:D){
    optima <- matrix(u_c[i,],nrow = N,ncol = R,byrow = TRUE)
    breadth <- matrix(s_c[i,],nrow = N,ncol = R,byrow = TRUE)
    S <- S * ((-1 / (breadth/2)^2) * (E[,i] - optima)^2 + 1)
    S <- ifelse(S < 0, 0 , S)

  }
  return(S)
}

# Understood as niche optima for each species, for each environmental variable
  u_c = matrix(nr = D, nc = R)
  u_c[1,] = seq(0.1,0.9, length=R)
# Understood as niche breadth
  s_c = matrix(0.2, nr = D, nc = R)

# Local performance, colonization
S <- S_f_quadratic(E, u_c, s_c)
plot_S <- rastPlot(S, title = "S - Local performance colonization", x = "Patches", y = "Species") +
  guides(fill = guide_colorbar(title = ""))
plot_S

When incorporating species interactions into the toy model, we use the following interaction matrix $\mathbf{A}$, where the colored black sections show species with potential of interacting:

  # # Interaction matrix
  A = matrix(0,nr=R,nc=R)
  d = as.matrix(dist(c(1:R),upper=TRUE,diag=T))
  A[d<=1] = -1
  diag(A) = 0

  plot_A <- rastPlot(A, title = "A - Interaction matrix", x = "species", y = "species") +
    scale_fill_gradient(low = "black", high = "grey") +
  theme(legend.position = "none")
# Compute the sum of ecological interactions for every location and every species
sum_interactions = function (A, Y) t(A%*%t(Y))
# this is considered to be "v"
v <- sum_interactions(A, Y)

plot_v <- rastPlot(v, title = "v - Sum of interactions", x = "patches", y = "species") +
  guides(fill = guide_colorbar(title = "")) +
  scale_fill_gradient()

ggarrange(plot_A, plot_v, ncol = 2)

We compute the effect of ecological interactions on colonization probability with the function below.The arguments for this function are v as the resulting matrix from the sum of interactions, d_c as the sensitivity to interactions, c_0 and c_max as the colonization parameters:

C_f = function(v, d_c, c_0, c_max) c_max*(1 +(1/c_0 - 1)*exp(-v*d_c))^-1

  # # Colonization function
  c_0 = rep(0.4, R) # Colonization at 0 interactions
  c_max = rep(1, R) # Colonization at max interactions

  # # Sensitivity to interactions
  d_c = 0.2

C <- C_f(v, d_c, c_0, c_max)
plot_C <- rastPlot(C, title = "C - Interactions on colonization", x = "Patches", y = "Species") +
  guides(fill = guide_colorbar(title = "")) +
  scale_fill_gradient()
plot_C

With all the components calculated, we can now compute the colonization probability $P\left(X_{i,z,t+\Delta t}=1│X_{i,z,t}=0\right)=I_{i,z,t} S_{i,z,t} C_{i,z,t}$

P_col = I*S*C
plot_Pcol <- rastPlot(P_col, title = "",
                      x = "Patches", y = "Species") +
  labs(subtitle = "Probability of colonization for each species on each patch")
plot_Pcol

The following function calculates the effect of the environment on the extinction, with E being the environmental variable, and u_e and u_s being species level effect and the assymptote.

M_f = function(E, u_e, s_e) {
    R = ncol(u_e)
    N = nrow(E)
    D = ncol(E)
    M = matrix(1, nr = N, nc = R)
    for(i in 1:D){
      M = M*(1-exp(-(E[,i]-matrix(u_e[i,],nr=N,nc=R,byrow=TRUE))^2 / matrix(s_e[i,],nr=N,nc=R,byrow=TRUE)^2))
      }
    return(M)   
}

# Set the function arguments
  # # Effect of the environment on extinction
  u_e = matrix(nr = D, nc = R)
  u_e[1,] = c(rep(0.5, R-1), 0.05) # One species having a lower level of extinction from environmental effect
  s_e = matrix(Inf, nr = D, nc = R)

  #u_e
  #head(s_e)
M <- M_f(E, u_e, s_e)

plot_M <- rastPlot(M, title = "Environment on extinction", x = "Patches", y = "Species")

The following shows the effect of ecological interactions on extinction, using the same v matrix calculated above.

E_f = function(v, d_e, e_0, e_min) {

    e_min_mat = matrix(e_min, nr = N, nc = R, byrow=TRUE)

    e_min_mat+(1/(1-e_min_mat)+(1/(e_0-e_min_mat)-1/(1-e_min_mat))*exp(d_e*v))^-1

}


#With the arguments computed as:


# # Extinction function
  #e_0 = c(rep(0.025, R-1), 0.5) # Extinction at 0 interactions, with one species having a higher value.
e_0 = rep(0.025, R)  
e_min = rep(0, R) # Exinction at max interactions

  # # Sensitivity to interactions

  d_e = 0
Ex <- E_f(v, d_e, e_0, e_min)

plot_Ex <- rastPlot(Ex, title = "Interactions on extinction", x = "Patches", y = "Species")

We can now compute the probability of extinction $P\left(X_{i,z,{t+\Delta t}}=1│X_{i,z,t}=0\right)=M_{i,z,t} E_{i,z,t}$

P_ext <- M * (1-Ex) + Ex
plot_Pext <- rastPlot(P_ext, title = "Probability of extinction", x = "Patches", y = "Species")

The figure shows these as identical graphs, since we have made all species have the same probability of extinction and the environment not having an effect on extinction.

ggarrange(plot_M, plot_Ex, plot_Pext, ncol = 2, nrow = 2)

The way parameters are set, the extinction component is the same for all species in every patch.

Testing and changes

# Perform the test
delta <- matrix(0, nr = N, nc = R)
rand <- matrix(runif(N*R), nr = N, nc = R)
delta[Y == 0 & rand < P_col] <- 1

# Perform the test
rand = matrix(runif(N*R), nr = N, nc = R)
delta[Y == 1 & rand < P_ext] = - 1
### Apply changes ###
Y_post = Y + delta

plot_Y <- rastPlot(Y, title = "Y0 - initial conditions matrix") +
  scale_fill_gradient(low = "white", high = "black")
plot_delta <- rastPlot(delta, title = "Change after col/ext") +
  scale_fill_gradient2()
plot_Ypost <- rastPlot(Y_post, title = "Y1 - occupancy at time 1") +
  scale_fill_gradient(low = "white", high = "black")


ggarrange(plot_Y, plot_delta, plot_Ypost)

References

Dray, S., P. Legendre, and P.R. Peres-Neto. (2006). Spatial modeling: a comprehensive framework for principal coordinate analysis of neighbor matrices (PCNM). Ecological Modelling, 196, 483-493.
Efron, B. 1978. Regression and ANOVA with zero-one data: Measures of residual Variation. Journal of the American Statistical Association 73:113–121.
Gelman, A., and I. Pardoe. (2006). Bayesian Measures of Explained Variance and Pooling in Multilevel (Hierarchical) Models. Technometrics 48, 241–251.
Gilarranz, L. J., Sabatino, M., Aizen, M. A., & Bascompte, J. (2015). Hot spots of mutualistic networks. Journal of Animal Ecology, 84(2), 407-413.
Gravel, D., Massol, F., Canard, E., Mouillot, D. & Mouquet, N. (2011). Trophic theory of island biogeography. Ecology Letters, 14, 1010–1016.
Gelman, A., and I. Pardoe. 2006. Bayesian Measures of Explained Variance and Pooling in Multilevel (Hierarchical) Models. Technometrics 48:241–251.
Ovaskainen, O., Abrego, N., Halme, P. & Dunson, D. (2016a). Using latent variable models to identify large networks of species-to-species associations at different spatial scales. Methods in Ecology and Evolution, 7, 549–555.
Ovaskainen, O., Hottola, J. & Siitonen, J. (2010). Modeling species co-occurrence by multivariate logistic regression generates new hypotheses on fungal interactions. Ecology, 91, 2514–2521.
Ovaskainen, O., Roy, D.B., Fox, R. & Anderson, B.J. (2016b). Uncovering hidden spatial structure in species communities with spatially explicit joint species distribution models. Methods in Ecology and Evolution, 7, 428–436.
Ovaskainen, O. & Soininen, J. (2011). Making more out of sparse data: hierarchical modeling of species communities. Ecology, 92, 289–295.
Ovaskainen, O., Tikhonov, G., Norberg, A., Guillaume Blanchet, F., Duan, L., Dunson, D., et al. (2017). How to make more out of community data? A conceptual framework and its implementation as models and software. Ecology Letters, 20, 561–576.
Peres-Neto, P.R., Legendre, P., Dray, S. & Borcard, D. (2006). Variation partitioning of species data matrices: Estimation and comparison of fractions. Ecology, 87, 2614–2625.
Shurin, J.B., Amarasekare, P., Chase, J.M., Holt, R.D., Hoopes, M.F. & Leibold, M.A. (2004). Alternative stable states and regional community structure. Journal of Theoretical Biology, 227, 359–368.
Tjur, T. (2009). Coefficients of determination in logistic regression models— A new proposal: the coefficient of discrimination. American Naturalist, 63, 366–372.



javirudolph/iStructureMetaco documentation built on Dec. 20, 2021, 9:09 p.m.