In diazrenata/BBSstories: What the Package Does (One Line, Title Case)

knitr::opts_chunk$set(echo = FALSE)
library(bbstrends)
library(dplyr)
library(ggplot2)

Load specific route

The New Hartford route goes up and down Riverton Road and was started in 1994. It feels pretty auspicious. It is route 102, region 18.

ibd <- readRDS("C:\\Users\\diaz.renata\\Documents\\GitHub\\BBSsize\\analysis\\isd_data\\ibd_isd_bbs_rtrg_102_18.Rds")

Here are the species present in this route over the past 25 years:

ibd_species <- ibd %>%
  select(id, ind_size) %>%
  group_by(id) %>%
  summarize(mean_size = mean(ind_size)) %>%
  ungroup() %>%
  arrange(mean_size)

species_list <- read.csv("C:\\Users\\diaz.renata\\Documents\\GitHub\\BBSsize\\analysis\\species_data\\species_list_working.csv", stringsAsFactors = F)

species_list <- species_list %>%
  select(id, english_common_name)

ibd_species <- left_join(ibd_species, species_list) %>%
  distinct()

ibd_species

ggplot(ibd_species, aes(x = log(mean_size))) +
  geom_histogram() +
  theme_bw()

Here is how species richness, abundance, biomass, and energy have changed over those years:

sv <- ibd %>%
  group_by(year) %>%
  summarize(richness = length(unique(id)),
            abundance = dplyr::n(),
            biomass = sum(ind_size),
            energy = sum(ind_b)) %>%
  ungroup() %>%
  mutate(mean_energy = energy / abundance,
         mean_mass = biomass/abundance)


sv_long <- sv %>%
  tidyr::pivot_longer(-year, names_to = "currency", values_to = "value")

ggplot(filter(sv_long, !grepl("mean", currency)), aes(x = year, y = value, color = currency)) +
  geom_point() +
  geom_line() +
  theme_bw() +
  scale_color_viridis_d(end = .8) +
  theme(legend.position = "none") +
  facet_wrap(vars(currency), scales = "free_y")

Here is population abundances:

pops <- ibd %>%
  select(year, id) %>%
  group_by(year, id) %>%
  summarize(abund = dplyr::n()) %>%
  ungroup() %>%
  group_by(year) %>%
  mutate(annual_abund = sum(abund)) %>%
  ungroup() %>%
  group_by_all() %>%
  mutate(prop_abund = abund / annual_abund) %>%
  ungroup()

ggplot(pops, aes(year, (prop_abund), color = id)) +
  geom_line() +
  theme_bw() +
  scale_color_viridis_d() +
  theme(legend.position = "none")

Trends/tradeoffs in E and N

We can do some (crude) linear model fitting. I've generally been finding that lms are OK, with caveats:

you do want to check for autocorrelation
the normal q-q plots are often kind of wonky

sv_long <- sv_long %>%
  group_by(currency) %>%
  mutate(scaled_value = scale(value)) %>%
  ungroup()

abund_lm <- lm(scaled_value ~ year, data = filter(sv_long, currency == "abundance"))

summary(abund_lm)

confint(abund_lm)

sv$abund_prediction = predict(abund_lm)

ggplot(sv, aes(x = year, y = scale(abundance))) +
  geom_point() +
  geom_line(aes(x = year, y = abund_prediction)) + 
  theme_bw()

hist(resid(abund_lm))

energy_lm <- lm(scaled_value ~ year, data = filter(sv_long, currency == "energy"))

summary(energy_lm)

confint(energy_lm)

sv$energy_prediction <- predict(energy_lm)


ggplot(sv, aes(x = year, y = scale(energy))) +
  geom_point() +
  geom_line(aes(x = year, y = energy_prediction)) + 
  theme_bw()

hist(resid(energy_lm))

Important points:

p = .04 for abundance, but .55 (!) for energy
confidence interval for slope for abundance gets close to 0; estimate is -.055 (this is on scaled)
r2 for abundance is around .13-.15 (adj v nonadj).
contrasted to energy, where the slope estimate is right on 0 and the r2 is .01 to -.02 (adj v nonadj)!

I've played a bit with autocorrelation and transformations, but trying not to get hung up on p values. Basically, the lms are generally aligning with what looks intuitive from the plots: a gentle decline in individual abundance, but no signal in energy.

Importantly, even though the average slope for energy is pretty close to 0, I would not call this compensation in the sense that the size distribution is shifting such that the same amount of energy gets divided among varying numbers of individiuals. Under that scenario, we'd expect per capita energy use to trade off with abundance: when abundance is low, per capita energy use should be high to compensate. That is not what we see:

ggplot(sv, aes(x = abundance, y = mean_energy)) +
  geom_point() +
  theme_bw()

summary(lm(scale(mean_energy) ~ scale(abundance), sv))

There's pretty much no support for the notion that mean energy use declines with increasing abundance.

It looks to me a lot more like energy varies a lot but not systematically with time. Energy is also much more variable than abundance:

print("energy sd/mean")
sd(sv$energy) / mean(sv$energy)

print("abundance sd/mean")
sd(sv$abundance) / mean(sv$abundance)

Another notion is that energy should maybe track abundance? We have already seen that energy is more variable than abundance and that the overall trend for energy is not the same as the one for abundance. Here is the extent to which abundance predicts energy:

energy_abund_lm <- (lm(scale(energy) ~ scale(abundance), data = sv))

sv$e_from_n <- predict(energy_abund_lm)

ggplot(sv, aes(x = scale(abundance), y = scale(energy))) +
  geom_point() +
  geom_line(aes(x = scale(abundance), y = e_from_n)) +
  theme_bw()

summary(energy_abund_lm)
hist(resid(energy_abund_lm))

So....decent, but a fair amount of error remaining. It's difficult to know how much error is a lot of error for an ecological community a priori. We can say provisionally that this is probably not the reflection of an extremely high-fidelity, controlled, idealized amplification/dampening of a fixed energy distribution: there's clearly play. But then again, we don't expect communities to behave in highly controlled, high-fidelity, idealized ways! There are all kinds of sources of measurement error etc, plus stochasticity, plus small N sampling issues.

Fixed or variable ISDs

One interesting way energy could track abundance would be if there were essentially one ISD for a community that endured over time, and the ISD we observe at each time point is just the result of drawing $N_t$ individuals from that distribution. In that scenario we would expect higher $N_t$ to give us more energy, but there is still room for error around that relationship.

There are a variety of ways we could construct our ur-ISD:

All of the individuals we have ever observed.
- This is straightforward but will allow us to put more weight into time points with higher N.
Take a mean of the ISDs from every time step.
- This can become distorted by sharp distributions from low N.

all_individuals_isd <- density(log(ibd$ind_size), from = 0, to = 1.2 * max(log(ibd$ind_size)), n = 8192)

all_individuals_isd <- data.frame(
  size = all_individuals_isd$x,
  density = all_individuals_isd$y / sum(all_individuals_isd$y)
)

make_year_isd <- function(thisyear, ibd) {
  this_ibd <- filter(ibd, year == thisyear)
  this_isd <- density(log(this_ibd$ind_size), from = 0, to = 1.2 * max(log(ibd$ind_size)), n = 8192)

  return(data.frame(x =this_isd$x,
                    y = this_isd$y / sum(this_isd$y),
                    year = thisyear))
}


year_isds <- lapply(unique(ibd$year), FUN = make_year_isd, ibd = ibd)


year_isds <- bind_rows(year_isds)

mean_year_isds <- year_isds %>%
  group_by(x) %>%
  summarize(mean_density = mean(y)) %>%
  ungroup() %>%
  rename(size = x)

gridExtra::grid.arrange(
  grobs = list(ggplot(mean_year_isds, aes(size, mean_density)) + 
  geom_line() +
  theme_bw() +
  ggtitle("Mean of annual ISDs"),

ggplot(all_individuals_isd, aes(size, density)) +
  geom_line() +
  theme_bw() +
  ggtitle("All individuals")
), nrow = 1)

ggplot(all_individuals_isd, aes(size, density)) +
  geom_line() +
  geom_line(data = mean_year_isds, aes(size, mean_density), color = "blue") +
  theme_bw() +
  ggtitle("Both at once for scale")

The plots above are both constructed based on KDEs. There are assumptions and artefacts in there I'm not quite 100% on: the weighting of samples from different time steps. I also have a well-worn mental module that likes to construct ISDs via kdes or GMMs and then get started comparing them via overlap, etc. I'd like to explore something a little different at the moment, so I'm going to actually put a pin in things derived from KDEs and work instead off of pools of individuals.

ggplot(ibd, aes(x = log(ind_size), y = ..ncount..)) +
  geom_histogram(bins = 100) +
  geom_density(aes(x = log(ind_size), y = ..ndensity..)) +
  theme_bw()

This is the distribution of sizes of all the individuals we've ever seen on this route.

One possibility is that we're equally likely to draw any of these individuals at any time step, and so we expect the ISD we observe at any time step to be a random sample of $N_t$ of all of these individuals. Alternatively, there could be substantively different underlying ISDs we are drawing our $N_t$ individuals from, and the underlying ISDs could vary systematically over time or orthogonal/not-detectably-parallel-to time.

These possibilities go fairly deep in their implications. We can start by thinking about it in terms of energy?

Energy could vary without a systematic trend if the ISD at each time step is generated via the random draws from an ur-ISD. But that variation might or might not rise to the magnitude we observe here.

Energy variability from a randomized isd

randomized_isd_e <- function(thisyear, ibd) {

  this_abund <- nrow(ibd [ which(ibd$year == thisyear), ])

  sampled_ind_b <- ibd$ind_b[ sample.int(nrow(ibd), size = this_abund, replace = F)]

  return(data.frame(
    year = thisyear,
    energy = sum(sampled_ind_b)))

}



randomized_es_list <- lapply(unique(sv$year), FUN = function(thisyear, ibd, times) return(bind_rows(replicate(n = times, expr = randomized_isd_e(thisyear = thisyear, ibd = ibd), simplify = F), .id = "rep")), ibd = ibd, times = 1000)


randomized_es <- bind_rows(randomized_es_list) %>%
  left_join(select(sv, year, abundance))


ggplot(randomized_es, aes(year, energy, group = rep)) +
  geom_line(alpha = .01) +
 geom_point(data = sv, aes(year, energy, group = year), color = "red") +
  theme_bw() 

ggplot(randomized_es, aes(abundance, energy)) +
  geom_jitter(height = 0, alpha = .01) +
  theme_bw() +
  geom_point(data = sv, aes(x = abundance, y = energy), color = "red")

randomized_es_cv <- randomized_es %>%
  group_by(rep) %>%
  summarize(energy_sd = sd(energy),
            energy_mean = mean(energy)) %>%
  mutate(energy_cv = energy_sd / energy_mean) %>%
  ungroup()

ggplot(randomized_es_cv, aes(x = energy_cv)) +
  geom_histogram() +
  theme_bw() +
  geom_vline(xintercept = sd(sv$energy) / mean(sv$energy), color = "red")

The cloud of lines represent the distribution of total energy values for 1000 bootstrap sampled ISDs. The histogram is the coefficient of variation (sd/mean) for E over every simulation; the red line is the observed. The observed E time series is way more variable than any of the ones that we generated by resampling from a stable ISD.

Similarly, the cloud of dots visualizes the variation in energy-abundance relation obtained via sampling. Compared to the red dots, which are observed.

percentiles <- list()

for(i in 1:length(unique(sv$year))) {
  this_year <- unique(sv$year)[i]
  this_val <- filter(sv, year == this_year)$energy

  this_vector <- filter(randomized_es, year == this_year)$energy

  percentiles[[i]] <- data.frame(
    year = this_year,
    percentile = scadsanalysis::get_percentile(this_val, this_vector)
  )
}

percentiles <- bind_rows(percentiles) 

ggplot(percentiles, aes(year, percentile)) +
  geom_point() +
  theme_bw() +
  geom_hline(yintercept = c(2.5, 97.5), color = "red")

mean(percentiles$low_p)
mean(percentiles$high_p) 
mean(percentiles$outside_95)

percentiles <- percentiles %>%
  mutate(low_p = percentile <2.5,
         high_p = percentile >97.5) %>%
  mutate(outside_95 = (low_p + high_p) > 0)

print("Energy too low:")
mean(percentiles$low_p)

print("Energy too high:")
mean(percentiles$high_p)

print("Energy outside 95% interval (2 sided):")
mean(percentiles$outside_95)

percentiles <- left_join(percentiles, sv)


ggplot(percentiles, aes(mean_energy, percentile)) +
  geom_point() +
  geom_vline(xintercept = mean(ibd$ind_b))

ggplot(percentiles, aes(year, percentile)) +
  geom_point()

ggplot(percentiles, aes(year, mean_energy)) +
  geom_point()

So - I could be wrong about this - but I think we're getting way too many years with suspiciousy high/low total energy, than we would expect if every year were a samples of $N_t$ individuals from the overall ISD. Using 2.5 and 97.5 as cutoffs, half the years are outside that interval.

I think the way for a year to be outside that interval is for its ISD to be relatively very different from the mean/overall ISD. For the difference to propagate all the way up to energy, it's the per capita energy that is different. You can have a different ISD and not have per capita energy change, but you can't have per capita energy change and not have a different ISD.

Years with unusual energy are not necessarily similar to each other in their ISDs. But they might be.

year_isds <- left_join(year_isds, percentiles) %>%
  mutate(description = ifelse(
    outside_95, ifelse(low_p, "low", "high"), "inside"
  ))

year_isds <- year_isds %>%
  mutate(percentile_binned = as.character(ceiling(percentile / 10)))

ggplot(year_isds, aes(x, y, color = description)) +
  geom_line() +
  theme_bw() +
  facet_wrap(vars(year), scales = "free_y")


ggplot(year_isds, aes(x, y, color = year)) +
  geom_line() +
  theme_bw() +
  facet_wrap(vars(description), scales = "free_y")

Time ideas

One thing I find interesting about this exercise is to ask whether the position of the observed E shifts systematically over time.

It might be interesting to look at comparisons to the overall ISD, vs a pool from a local short time period, vs a pool from the same number of time steps randomly distributed. So, how does E in 1990 compare to a random draw from a pool of r c(1988, 1989, 1991, 1992), vs a random draw from a pool of r sample(c(1994:1989, 1991:2019), size = 4, replace = F)? If observations are closer to a local moving average ISD, than to one randomly dispersed in time, you have some kind of temporal structure in the variability. But if not, there's variability but it's not structured temporally.

A related-but-slightly-sideways question is, how many distinct ur-ISDs do we think there are?

diazrenata/BBSstories documentation built on July 6, 2020, 11:59 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com