Multiple species modelling independently and concurrently

An example of a logistic regression being used to estimate the probability of multiple species' presences along a number of environmental gradients. Although modelled concurrently, the random variables for each species are independent. We first simulate some data to model followed by the greta code.

Where a single observation per species and location would have a bernoulli error distribution, multiple observations for each species and location have a binomial distribution.

When modelling multiple species (or other grouping factor), we need an extra step in constructing the linear predictor. In order to add multiple greta arrays together for each species we can use the sweep() function.

data

# make fake data
n_species <- 5
n_env <- 3
n_sites <- 20

env <- matrix(rnorm(n_sites * n_env), nrow = n_sites)
occupancy <- matrix(rbinom(n_species * n_sites, 1, 0.5), nrow = n_sites)

greta code

alpha <- normal(0, 10, dim = n_species)
beta <- normal(0, 10, dim = c(n_env, n_species))

env_effect <- env %*% beta

# add intercepts for all species
linear_predictor <- sweep(env_effect, 2, alpha, FUN = '+')

# ilogit of linear predictor
p <- ilogit(linear_predictor)

# a single observation means our data are bernoulli distributed
distribution(occupancy) <- bernoulli(p)


greta-dev/greta documentation built on Dec. 21, 2024, 5:03 a.m.