In diazrenata/BBSstories: What the Package Does (One Line, Title Case)

knitr::opts_chunk$set(echo = FALSE)
library(bbstrends)
library(dplyr)
library(ggplot2)

Load route

The New Hartford route goes up and down Riverton Road and was started in 1994. It is route 102, region 18.

ibd <- readRDS("C:\\Users\\diaz.renata\\Documents\\GitHub\\BBSsize\\analysis\\isd_data\\ibd_isd_bbs_rtrg_102_18.Rds")

Here are the species present in this route over the past 25 years:

ibd_species <- ibd %>%
  select(id, ind_size) %>%
  group_by(id) %>%
  summarize(mean_size = mean(ind_size)) %>%
  ungroup() %>%
  arrange(mean_size)

species_list <- read.csv("C:\\Users\\diaz.renata\\Documents\\GitHub\\BBSsize\\analysis\\species_data\\species_list_working.csv", stringsAsFactors = F)

species_list <- species_list %>%
  select(id, english_common_name)

ibd_species <- left_join(ibd_species, species_list) %>%
  distinct()

head(ibd_species)

print("...93 species total")

ggplot(ibd_species, aes(x = log(mean_size))) +
  geom_histogram() +
  theme_bw()

Here is how species richness, abundance, biomass, and energy have changed over those years:

sv <- ibd %>%
  group_by(year) %>%
  summarize(richness = length(unique(id)),
            abundance = dplyr::n(),
            biomass = sum(ind_size),
            energy = sum(ind_b)) %>%
  ungroup() %>%
  mutate(mean_energy = energy / abundance,
         mean_mass = biomass/abundance)


sv_long <- sv %>%
  tidyr::pivot_longer(-year, names_to = "currency", values_to = "value")

ggplot(filter(sv_long), aes(x = year, y = value, color = currency)) +
  geom_point() +
  geom_line() +
  theme_bw() +
  scale_color_viridis_d(end = .8) +
  theme(legend.position = "none") +
  facet_wrap(vars(currency), scales = "free_y")

Trends/tradeoffs in E and N

We can do some (crude) linear model fitting. I've generally been finding that lms are OK, with caveats:

you do want to check for autocorrelation
the normal q-q plots are often kind of wonky

sv_long <- sv_long %>%
  group_by(currency) %>%
  mutate(scaled_value = scale(value)) %>%
  ungroup()

abund_lm <- lm(scaled_value ~ year, data = filter(sv_long, currency == "abundance"))

summary(abund_lm)

confint(abund_lm)

sv$abund_prediction = predict(abund_lm)

ggplot(sv, aes(x = year, y = scale(abundance))) +
  geom_point() +
  geom_line(aes(x = year, y = abund_prediction)) + 
  theme_bw()

hist(resid(abund_lm))

energy_lm <- lm(scaled_value ~ year, data = filter(sv_long, currency == "energy"))

summary(energy_lm)

confint(energy_lm)

sv$energy_prediction <- predict(energy_lm)


ggplot(sv, aes(x = year, y = scale(energy))) +
  geom_point() +
  geom_line(aes(x = year, y = energy_prediction)) + 
  theme_bw()

hist(resid(energy_lm))

Important points:

p = .04 for abundance, but .55 (!) for energy
confidence interval for slope for abundance gets close to 0; estimate is -.055 (this is on scaled)
r2 for abundance is around .13-.15 (adj v nonadj).
contrasted to energy, where the slope estimate is right on 0 and the r2 is .01 to -.02 (adj v nonadj)!

I've played a bit with autocorrelation and transformations, but trying not to get hung up on p values. Basically, the lms are generally aligning with what looks intuitive from the plots: a gentle decline in individual abundance, but no signal in energy.

Importantly, even though the average slope for energy is pretty close to 0, I would not call this compensation in the sense that the size distribution is shifting such that the same amount of energy gets divided among varying numbers of individiuals. Under that scenario, we'd expect per capita energy use to trade off with abundance: when abundance is low, per capita energy use should be high to compensate. That is not what we see:

ggplot(sv, aes(x = abundance, y = mean_energy)) +
  geom_point() +
  theme_bw()

 summary(lm(scale(mean_energy) ~ scale(abundance), sv))

There's pretty much no support for the notion that mean energy use declines with increasing abundance.

It looks to me a lot more like energy varies a lot but not systematically with time. Energy is also much more variable than abundance:

print("energy sd/mean")
sd(sv$energy) / mean(sv$energy)

print("abundance sd/mean")
sd(sv$abundance) / mean(sv$abundance)

Another notion is that energy should maybe track abundance? We have already seen that energy is more variable than abundance and that the overall trend for energy is not the same as the one for abundance. Here is the extent to which abundance predicts energy:

energy_abund_lm <- (lm(scale(energy) ~ scale(abundance), data = sv))

sv$e_from_n <- predict(energy_abund_lm)

ggplot(sv, aes(x = scale(abundance), y = scale(energy))) +
  geom_point() +
  geom_line(aes(x = scale(abundance), y = e_from_n)) +
  theme_bw()

summary(energy_abund_lm)
hist(resid(energy_abund_lm))

So....decent, but a fair amount of error remaining. It's difficult to know how much error is a lot of error for an ecological community a priori. We can say provisionally that this is probably not the reflection of an extremely high-fidelity, controlled, idealized amplification/dampening of a fixed energy distribution: there's clearly play. But then again, we don't expect communities to behave in highly controlled, high-fidelity, idealized ways! There are all kinds of sources of measurement error etc, plus stochasticity, plus small N sampling issues.

Fixed or variable ISDs

One interesting way energy could track abundance would be if there were essentially one ISD for a community that endured over time, and the ISD we observe at each time point is just the result of drawing $N_t$ individuals from that distribution. In that scenario we would expect higher $N_t$ to give us more energy, but there is still room for error around that relationship.

There are a variety of ways we could construct our ur-ISD:

All of the individuals we have ever observed.
- This is straightforward but will allow us to put more weight into time points with higher N.
Take a mean of the ISDs from every time step.
- This can become distorted by sharp distributions from low N.

all_individuals_isd <- density(log(ibd$ind_size), from = 0, to = 1.2 * max(log(ibd$ind_size)), n = 8192)

all_individuals_isd <- data.frame(
  size = all_individuals_isd$x,
  density = all_individuals_isd$y / sum(all_individuals_isd$y)
)

make_year_isd <- function(thisyear, ibd) {
  this_ibd <- filter(ibd, year == thisyear)
  this_isd <- density(log(this_ibd$ind_size), from = 0, to = 1.2 * max(log(ibd$ind_size)), n = 8192)

  return(data.frame(x =this_isd$x,
                    y = this_isd$y / sum(this_isd$y),
                    year = thisyear))
}


year_isds <- lapply(unique(ibd$year), FUN = make_year_isd, ibd = ibd)


year_isds <- bind_rows(year_isds)

ggplot(ibd, aes(log(ind_size))) +
  geom_histogram() +
  theme_bw() +
  facet_wrap(vars(year), scales = "free_y")


mean_year_isds <- year_isds %>%
  group_by(x) %>%
  summarize(mean_density = mean(y)) %>%
  ungroup() %>%
  rename(size = x)

gridExtra::grid.arrange(
  grobs = list(ggplot(mean_year_isds, aes(size, mean_density)) + 
  geom_line() +
  theme_bw() +
  ggtitle("Mean of annual ISDs"),

ggplot(all_individuals_isd, aes(size, density)) +
  geom_line() +
  theme_bw() +
  ggtitle("All individuals")
), nrow = 1)

ggplot(all_individuals_isd, aes(size, density)) +
  geom_line() +
  geom_line(data = mean_year_isds, aes(size, mean_density), color = "blue") +
  theme_bw() +
  ggtitle("Both at once for scale")

The plots above are both constructed based on KDEs. There are assumptions and artefacts in there I'm not quite 100% on: the weighting of samples from different time steps. I also have a well-worn mental module that likes to construct ISDs via kdes or GMMs and then get started comparing them via overlap, etc. I'd like to explore something a little different at the moment, so I'm going to actually put a pin in things derived from KDEs and work instead off of pools of individuals.

ggplot(ibd, aes(x = log(ind_size), y = ..ncount..)) +
  geom_histogram(bins = 100) +
  geom_density(aes(x = log(ind_size), y = ..ndensity..)) +
  theme_bw()

This is the distribution of sizes of all the individuals we've ever seen on this route.

One possibility is that we're equally likely to draw any of these individuals at any time step, and so we expect the ISD we observe at any time step to be a random sample of $N_t$ of all of these individuals. Alternatively, there could be substantively different underlying ISDs we are drawing our $N_t$ individuals from, and the underlying ISDs could vary systematically over time or orthogonal/not-detectably-parallel-to time.

These possibilities go fairly deep in their implications. We can start by thinking about it in terms of energy?

Energy could vary without a systematic trend if the ISD at each time step is generated via the random draws from an ur-ISD. But that variation might or might not rise to the magnitude we observe here.

Energy variability from a randomized isd

randomized_isd_e <- function(thisyear, ibd) {

  this_abund <- nrow(ibd [ which(ibd$year == thisyear), ])

  sampled_ind_b <- ibd$ind_b[ sample.int(nrow(ibd), size = this_abund, replace = F)]

  return(data.frame(
    year = thisyear,
    energy = sum(sampled_ind_b)))

}



randomized_es_list <- lapply(unique(sv$year), FUN = function(thisyear, ibd, times) return(bind_rows(replicate(n = times, expr = randomized_isd_e(thisyear = thisyear, ibd = ibd), simplify = F), .id = "rep")), ibd = ibd, times = 1000)


randomized_es <- bind_rows(randomized_es_list) %>%
  left_join(select(sv, year, abundance))


ggplot(randomized_es, aes(year, energy, group = rep)) +
  geom_line(alpha = .01) +
 geom_point(data = sv, aes(year, energy, group = year), color = "red") +
  theme_bw() 

ggplot(randomized_es, aes(abundance, energy)) +
  geom_jitter(height = 0, alpha = .01) +
  theme_bw() +
  geom_point(data = sv, aes(x = abundance, y = energy), color = "red")

randomized_es_cv <- randomized_es %>%
  group_by(rep) %>%
  summarize(energy_sd = sd(energy),
            energy_mean = mean(energy)) %>%
  mutate(energy_cv = energy_sd / energy_mean) %>%
  ungroup()

ggplot(randomized_es_cv, aes(x = energy_cv)) +
  geom_histogram() +
  theme_bw() +
  geom_vline(xintercept = sd(sv$energy) / mean(sv$energy), color = "red")

The cloud of lines represent the distribution of total energy values for 1000 bootstrap sampled ISDs. The histogram is the coefficient of variation (sd/mean) for E over every simulation; the red line is the observed. The observed E time series is way more variable than any of the ones that we generated by resampling from a stable ISD.

Similarly, the cloud of dots visualizes the variation in energy-abundance relation obtained via sampling. Compared to the red dots, which are observed.

diazrenata/BBSstories documentation built on July 6, 2020, 11:59 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com