In mikeroswell/MeanRarity: Hill Diversity Estimation and Visualization

The MeanRarity package contains functions to compute, estimate, and visualize Hill numbers, using the framework that Hill numbers simply compute the abundance-weighted [generalized] mean species rarity of a community (rarity is the reciprocal of relative abundance). Additionally, the package contains functions for generating and sampling abundance vectors and generating plots. This vignette is not a comprehensive guide to the package. However, it includes data and code to replicate the quantitative main-text figures presented in @Roswell2021. Thus, the examples in the vignette are ordered according to the figures in the article.

The goal of [Hill] diversity metrics is to summarize the distribution of species' [relative] abundances. The first figure in the "Conceptual guide" is a visualization of the observed bee abundances from the four communities in the example dataset beeObs, analyzed in the guide. The package also includes a summary of those data, beeAbunds. The function radplot() takes those data and returns a rank-abundance distribution plot. We love the new R package patchwork for assembling plots, and call that package here.

knitr::opts_chunk$set(collapse = TRUE, comment = "#>")#, dev = "pdf")
library(MeanRarity)
library(patchwork) # amazing ggplot2 add-on for multi-panel plots
library(dplyr) # part of the tidyverse
library(ggplot2) # part of the tidyverse
library(tidyr) # part of the tidyverse
library(purrr) # part of the tidyverse
#library(tictoc) # lightweight package for convenient timing
library(furrr) # `tidyverse` and `parallel` combined
library(iNEXT) # implements diversity CI recommended by Chao et al. 2014, Chao and Jost 2015

# to make RAD plot for communities comparable, use one maxab and one maxrich for all plots
maxab <- max(sapply(beeAbunds[2:5], max)) # the maximum observed species abundance in any community

maxrich <- max(sapply(beeAbunds[2:5], function(x){sum(x > 0)})) # maximum observed richness


myplots <- purrr::map(2:5, function(makedatind){
    radplot(beeAbunds[,makedatind], maxab = maxab, maxrich = maxrich
            , shape = 13 + makedatind # solid filled shapes
            , size = 1.5
            , fill = c("#1b9e77", "#d95f02", "#7570b3", "#e7298a")[makedatind - 1]) +
    ggplot2::labs(title = ggplot2::element_blank()) +
    ggplot2::theme(text = ggplot2::element_text(size = 14)
                   , axis.text = ggplot2::element_text(colour = "black"))
})


myplots[[1]] + # first community
  ggplot2::labs(x = "") +# syling
white_y(myplots[[3]]) + # second community 
  ggplot2::labs(x = "") + # styling
myplots[[2]] +  # third community
white_y(myplots[[4]]) # fourth community

The next figure in the guide shows that the expected value for both observed and estimated asymptotic richness increase with sample size. The code to generate expected diversity estimates is computationally expensive and we recommend not to run the code in this vignette. It is documented in the package in the file "data-raw/DATASET.R", and the result is an exported dataset mean_ests. This script uses the function subsam(), which samples without replacement, and the function obs_est(), which computes Hill diversity estimates, including asymptotic estimates [@Chao2015] using functions modified from the R package SpadeR

# this takes a long time to run so we're going to stash it but export the data it creates

sz<-1:745 #sample sizes from 1 to maximum observed total abundance at a site
nrep<-200 #Large enough because estimating means

#tictoc::tic() #time this
nc <- parallel::detectCores() - 1 #nc is number of cores to use in parallel processing
future::plan(strategy = future::multiprocess, workers = nc) #this sets up the cluster


div_ests <- purrr::map_dfr(1:nrep, function(k){
 furrr::future_map_dfr(sz, function(x){
    map(2:length(beeAbunds), function(sitio){
      f<-beeAbunds %>% dplyr::select(all_of(sitio))
      if(sum(f) > x){
        comm = dplyr::pull(f) %>% subsam(size = x)
        return(data.frame(obs_est(comm)
                     , site = names(f)))
      }
    })
  })
})

#tictoc::toc() #507 seconds with quad-core i7 processor, not bad.
mean_narm <- function(x){mean(x, na.rm = T)}
mean_ests <- div_ests %>%
  dplyr::group_by(site, individuals) %>%
  dplyr::summarize_all(mean_narm)

``` {r Species Accumulation Curves, fig.width = 5, fig.height = 5}

site_ab <- purrr::map_dfr(names(beeAbunds[,2:5]), function(site){ data.frame(site, abun = sum(beeAbunds[,site])) })

mean_ests %>% dplyr::left_join(site_ab) %>% dplyr::mutate(cutrows = individuals > 0.8 * abun) %>% #truncate the dataset because literal bootstrapping gets funny and the Chao1 #levels off near observed abundance, didn't want to discuss this in guide #since it's obvious but not interesting and possibly very confusing. dplyr::filter(!cutrows) %>% ggplot2::ggplot(aes(individuals, obsrich, color = site))+ ggplot2::geom_point(alpha = 0.3, size = 0.5) + ggplot2::geom_point(aes(y = chaorich, color = site, shape = site) , size=0.8, alpha=0.5) + ggplot2::theme_classic() + ggplot2::labs(x = "\nIndividuals", y = "Species richness\n") + ggplot2::xlim(c(0, 400)) + ggplot2::ylim(c(0, 80)) + ggplot2::theme(text = ggplot2::element_text(size=14) , axis.text = ggplot2::element_text(colour = "black")) + ggplot2::theme(legend.position = "none") + ggplot2::scale_shape_manual(values = c(15:18)) + ggplot2::scale_color_brewer(palette = "Dark2")

Figure 3 in the guide shows an empirical Hill diversity profile. The
`MeanRarity` package has a function to produce such profiles, `divpro()`.
This lightweight function does not produce asymptotic estimates. Because the
guide focused on the mean rarity formulation for Hill diversities,
parameterizing the scaling exponent with $\ell$, the profile is plotted from
$\ell = -1$ to $\ell =1$, whereas other software packages such as `iNEXT` and
`SpadeR` plot the profile from $q = 1-\ell =0$ to $q = 1-\ell =2$.

```r

site_pro <- purrr::map_dfr(names(beeAbunds[2:5]), function(x){
  data.frame(site = x, divpro(beeAbunds[, x]))}) #make profile for each community

site_pro %>%  ggplot2::ggplot(aes(ell, d, color = site)) +
    ggplot2::geom_line() +
    ggplot2::geom_point(data = site_pro %>% 
                          dplyr::filter(ell %in% c(-1,1,0))
               , aes(ell, d, color = site, shape = site)
               , size = 3, alpha = 0.5) +
    ggplot2::theme_classic() +
    ggplot2::theme(text = ggplot2::element_text(size = 14)
          , axis.text = ggplot2::element_text(colour = "black")
          , legend.position = "none") +
    #this is the unicode for the curly l
    ggplot2::labs(x = "Exponent \U2113", y = "Diversity") +
    ggplot2::scale_shape_manual(values = c(15:18)) +
    ggplot2::scale_color_brewer(palette = "Dark2")

A central idea in the guide is that researcher decisions can drive the results of biodiversity comparisons: how much to sample, how to standardize data, which diversity metric to use, and how to express uncertainty. Figure 4 illustrates how some of these choices could alter interpretation of the example dataset.

The code below does not highlight any of the novel tools in the MeanRarity package. However, it does use Chao_Hill_abu(), a copy of an internal function from the R package SpadeR. To produce fig 4 in the MS, we used the R package iNEXT, which is not imported by MeanRarity. It also uses our simplified version of the code for computing approximate CI from iNEXT, an internal function MeanRarity:::Bootstrap.CI_Df(). This computes the same intervals as iNEXT. Although elsewhere in the Oikos forum piece, we critique these intervals, we included them in this figure to demonstrate how the choice of tools would or would not drive results.

This code chunk will likely take almost a minute to run.

#tictoc::tic() #time this

iN_form <- as.list(beeAbunds[, -1])
by_size <- iNEXT::estimateD(iN_form
                                , base = "size")

by_coverage <- iNEXT::estimateD(iN_form
                                , base = "coverage")

by_effort <- purrr::map_dfr(lapply(iN_form, sum), function(size){
    iNEXT::estimateD(iN_form
                    , base = "size"
                    , level = size) %>% 
    dplyr::filter(Method == "Observed") 

})

iseveral <- dplyr::bind_rows(size = by_size
                             , coverage = by_coverage
                             , effort = by_effort
                             , .id = "standard") %>%
  dplyr::mutate(order = 1 - Order.q)

#tictoc::toc()  #43 sec this time!?!?!

#tictoc::tic()
asyseveral <- purrr::map_dfr(2:5, function(sitio){
  purrr::map_dfr(-1:1, function(l){
    theboots <- MeanRarity:::Bootstrap.CI_df(pull(beeAbunds[, sitio]), l)
    est <- Chao_Hill_abu(dplyr::pull(beeAbunds[, sitio]), l)
    data.frame(Assemblage = names(beeAbunds[, sitio])
               , order = l
               , qD = est
               , qD.LCL = est - theboots[2]
               , qD.UCL = est + theboots[4])
  })
})
#tictoc::toc() # <2 seconds.

asyseveral$Method <- "asymptotic"
asyseveral$standard <- "asymptotic"
allseveral <- dplyr::bind_rows(iseveral, asyseveral)
#diversities, standardized by effort

#get some parameters of data
#individuals in sample with fewest
minm <- min(by_effort %>% 
              dplyr::filter(m == min(m)) %>% 
              dplyr::pull(m)) 

minc <- min(by_effort %>% 
              dplyr::group_by(Assemblage) %>% 
              dplyr::summarize(SC = max(SC)) %>% 
              dplyr::pull(SC)) # coverage of sample with lowest


tog <- allseveral %>% 
  dplyr::mutate(sm = paste(standard, Assemblage, sep=""))

inds <- data.frame("order" = c(1, 0, -1), divind = factor(c("richness"
                                                           , "Hill-Shannon"
                                                           , "Hill-Simpson")
                                                 , levels = c("richness"
                                                              , "Hill-Shannon"
                                                              , "Hill-Simpson")))

togn <- tog %>% 
  dplyr::left_join(inds) %>%  
  dplyr::ungroup()

togn <- togn %>%     
  dplyr::mutate(divind = factor(divind
                         , levels = c("richness"
                                      , "Hill-Shannon"
                                      , "Hill-Simpson"))
          , site = factor(Assemblage
                          , levels = c("IAS_3"
                                     , "Cold Soil_5"
                                     , "Lord Stirling_4"
                                     , "Fox Hill_5")))
togn <- togn %>% 
  dplyr::mutate(methlab = factor(standard
                          , levels = c("effort"
                                       , "size"
                                       , "coverage"
                                       , "asymptotic")
                           , labels = c("effort (14 transects)"
                                        , "size (255 individuals)"
                                        , "coverage (95.7%)"
                                        , "asymptotic")))
ebsize<-0.6
ebw<-0.4
psize<-2.5

############################
# make figures, which get stuck together.
leftPlot <- togn %>% 
    dplyr::filter(methlab != "asymptotic") %>% 
    ggplot2::ggplot(aes(site, qD, color = site, shape = site)) +
    ggplot2::geom_errorbar(aes(ymin = qD.LCL, ymax = qD.UCL)
                  , linewidth = ebsize, width = ebw) +
    ggplot2::geom_point(size = psize) +
    ggplot2::facet_grid(divind ~ methlab, switch = "y", scales = "free_y") +
    ggplot2::theme_classic() +
    ggplot2::theme(legend.position = "none", axis.text.x = ggplot2::element_blank()
          , axis.ticks.x = ggplot2::element_blank()) +
    ggplot2::labs(y = NULL) +
    ggplot2::theme(panel.spacing = unit(0, "lines")
          , panel.border = ggplot2::element_rect(fill = NA)
          , strip.background = ggplot2::element_blank()
          , strip.text = ggplot2::element_text(face = "bold", size = 10)
          , axis.title.x = ggplot2::element_blank()
          , legend.position = "none"
          , strip.placement = "outside") +
    ggplot2::scale_y_continuous(limits = c(0,NA)
                       , expand = ggplot2::expansion(mult = c(0, .05))) +
    ggplot2::scale_shape_manual(values = c(17, 15, 18, 16)) +
    ggplot2::scale_color_manual(values = c("#7570b3"
                                , "#1b9e77"
                                , "#e7298a"
                                , "#d95f02")) 
rightPlot <- togn %>% 
  dplyr::filter(methlab == "asymptotic") %>% 
  ggplot2::ggplot(ggplot2::aes(site, qD, color = site, shape = site)) +
  ggplot2::geom_errorbar(ggplot2::aes(ymin = qD.LCL, ymax = qD.UCL)
                         , linewidth = ebsize, width = ebw) +
  ggplot2::geom_point(size = psize) +
  ggplot2::facet_grid(divind~methlab, switch = "y", scales = "free_y") +
  ggplot2::theme_classic() +
  ggplot2::theme(legend.position = "none"
                 , axis.text.x = ggplot2::element_blank()
                 , axis.ticks.x = ggplot2::element_blank()) +
  ggplot2::labs(y = NULL, x = NULL) +
  ggplot2::theme(panel.spacing = unit(0, "lines")
                 , panel.border = ggplot2::element_rect(fill = NA)
                 , strip.background = ggplot2::element_blank()
                 , strip.text.x = ggplot2::element_text(face = "bold"
                                                        , size = 10)
                 , strip.text.y = ggplot2::element_blank()
                 , axis.title = ggplot2::element_blank()
                 , legend.position = "none"
                 , strip.placement = "outside") +
  ggplot2::scale_y_continuous(limits = c(0, NA)
                     , expand = ggplot2::expansion(mult = c(0, 0.05))) +
  ggplot2::scale_shape_manual(values = c(17, 15, 18, 16)) +
  ggplot2::scale_color_manual(values = c("#7570b3"
                              , "#1b9e77"
                              , "#e7298a"
                              , "#d95f02" ))

# throwing errors 20230420
#leftPlot + rightPlot + plot_layout(widths = c(3, 1))

A key idea is that the amount of sampling itself (and not simply standardization) itself affects relative biodiversity estimates. We demonstrated this in the MS with Figure 5. Although the MeanRarity package exports sample_finite(), sample_finite(), and subcom(), all convenience functions for sampling community data, to produce Fig. 5 no tools from MeanRarity were used.

# this takes a long time to run so we're going to stash it but export the data
# in the package

# the idea here is to sample from 14 30-minute sampling bouts from each site and 
# compare coverage. While we resample 30-minute transects, we compute the
# estimated sample coverage and count individuals in each subsample. We use this
# to compare standardization methods. 

maxeff <- 14
reps <- 9999
#tictoc::tic() #time this
nc <- parallel::detectCores() - 1 #nc is  cores to use in parallel processing
future::plan(strategy = future::multiprocess, workers = nc) # sets up cluster

show_ests <- furrr::future_map_dfr(1:reps, function(x){
  purrr::map_dfr(1:maxeff, function(i){
    sam <- beeObs %>%
      dplyr::group_by(sr) %>%
      dplyr::do(dplyr::filter(., start %in%
                         sample(unique(.$start)
                                , size = maxeff - i + 1
                                , replace = F)))
    subcom <- sam %>% dplyr::group_by(bee, sr) %>%
      dplyr::summarize(abund = n()) %>%
      tidyr::spread(key = sr
                    , value = abund
                    , fill = 0) %>%
      as.data.frame()
    basicdat <- purrr::map_dfr(apply(subcom[, -1], MARGIN = 2, obs_est), rbind)
    return(data.frame(basicdat
                      , site = names(subcom[, -1])
                      , transects = maxeff - i + 1))
  })
})

#tictoc::toc() #Under 2 hours with only 4 cores

effort_means <- show_ests %>%
  dplyr::group_by(site, transects) %>%
  dplyr::summarize_at(.vars = c("n"
                              , "coverage"
                              , "obsrich"
                              , "chaorich"
                              , "obsshan"
                              , "chaoshan"
                              , "obssimp"
                              , "chaosimp"), .funs = mean)


#gather to make a prettier plot, will need to get fancy now that there's asy also
effort_rare <- effort_means %>% dplyr::rename(
  "richness_observed" = obsrich
  , "richness_asymptotic" = chaorich
  , "Hill-Shannon_observed" = obsshan
  , "Hill-Shannon_asymptotic" = chaoshan
  , "Hill-Simpson_observed" = obssimp
  , "Hill-Simpson_asymptotic" = chaosimp
  , size = n
  , effort = transects) %>%
  tidyr::gather(key = "ell"
                , value = "diversity"
                , "richness_observed"
                , "richness_asymptotic"
                , "Hill-Shannon_observed"
                , "Hill-Shannon_asymptotic"
                , "Hill-Simpson_observed"
                , "Hill-Simpson_asymptotic") %>%
  tidyr::gather(key = method
                , value =xax
                , effort
                , size
                , coverage) %>%
  tidyr::separate(ell, c("divind", "etype"), sep = "_")

effort_rare$divind <- factor(effort_rare$divind
                          , levels = c("richness"
                                       , "Hill-Shannon"
                                       , "Hill-Simpson")
                          , labels = c("richness"
                                     , "Hill-Shannon"
                                     , "Hill-Simpson"))
effort_rare$method <- factor(effort_rare$method
                          , levels = c("effort"
                                     , "size"
                                     , "coverage"))
effort_rare$etype <- factor(effort_rare$etype
                         , levels = c("observed"
                                      , "asymptotic"))

``` {r Fig 5 rarefaction, fig.width =7, fig.height = 7}

effort_rare %>% dplyr::filter(etype == "observed") %>% ggplot2::ggplot(aes(xax , diversity , color = site , shape = site)) + ggplot2::geom_point(size = 2) + ggplot2::geom_line() + ggplot2::theme_classic() + ggplot2::scale_alpha(range=c(0.4, 0.8)) + ggplot2::theme( legend.position = "none" , axis.text = ggplot2::element_text(colour = "black") , axis.title.y = ggplot2::element_text(size = 14) , panel.spacing = unit(1.3, "lines") , strip.placement.y = "outside") + ggplot2::labs(x = " transects individuals % coverage " , y = "diversity") + ggplot2::facet_grid(divind ~ method, scales = "free", switch = "y") + ggplot2::expand_limits(y = 0) + ggplot2::scale_shape_manual(values = c(15:18)) + ggplot2::scale_color_brewer(palette = "Dark2") + ggplot2::scale_y_log10()

We examine whether the same sensitivities apply to asymptotic diversity
estimates, which present an alternative to the data standardization approaches,
in Appendix C. In fact, within the range of sampling completeness and 
unevennesses in our dataset, the asymptotic diversity estimators seem as 
unreliable as sample diversity in demonstrating relative differences in 
diversity between communities.

```r
effort_rare %>% 
  ggplot2::ggplot(ggplot2::aes(xax, diversity, color = site, shape = site)) +
  ggplot2::geom_line(ggplot2::aes(alpha = etype), size = 1) +
  ggplot2::theme_classic() +
  ggplot2::scale_alpha(range = c(0.4, 0.8)) +
  ggplot2::theme( legend.position = "none"
         , axis.text = ggplot2::element_text(colour = "black")
         , axis.title.y = ggplot2::element_text(size = 14)
         , panel.spacing = unit(0.3, "lines")
         , strip.placement.y = "outside") +
  # shoudl fix the spacing between x-axis labels
  ggplot2::labs(x = "    transects                                     individuals                                     % coverage    "
       , y = "diversity") +
  ggplot2::facet_grid(divind ~ method + etype, scales = "free", switch = "y") +
  ggplot2::expand_limits(y = 0) +
  ggplot2::scale_color_brewer(palette = "Dark2") + 
  ggplot2::scale_y_log10() +
  ggplot2::scale_alpha_manual(values = c(1, 0.6))

In Box 5, we visualize the mean rarity by finding a physical balance point. We hope this helps clarify how rarity is scaled by different means, but always weighted by the abundance of individuals. The function rarity_plot() generates these figures. They can also be generated interactively in the shiny app \url{https://mean-rarity.shinyapps.io/rshiny_app}. The key thing to notice is that while the scaling of rarity changes with the exponent $\ell$, the weighting of each species stays the same. This is the simplest way to see how Hill diversities compute mean rarity from relative abundances, and we find the mean rarity framework simpler than imagining that the scaling parameter somehow adjusts species weighting.

The balance plots are fairly complex, and use several exported functions from the MeanRarity package. They rely, of course, on Hill diversity computations, with the function rarity(), which in turn calls pfun() and ipfun() to compute power transformations and their inverse. Additionally, the plots are built with ggplot, and the curious reader may wish to explore the code (and even suggest improvements!).

# plot some rarity "balance plots" for guide to measuring diversity (Roswell et al. 2021 http://onlinelibrary.wiley.com/doi/10.1111/oik.07202/abstract)
ab <- c(20,8,5,4,2,1)

rich_bal <- rarity_plot(ab, 1, base_size = 12) +
  ggplot2::labs(x = "") +
  scale_color_brewer(type = "qual", palette = "Dark2") 

#do not return all the text this time
shan_bal <- white_y(rarity_plot(ab, 0, base_size = 12, verbose = F)) + 
  scale_color_brewer(type = "qual", palette = "Dark2")

simp_bal <- white_y(rarity_plot(ab, -1, base_size = 12, verbose = F)) + 
  ggplot2::labs(x = "") +
  scale_color_brewer(type = "qual", palette = "Dark2")

#assemble triptych with patchwork
rich_bal +
  shan_bal +
  simp_bal

In the appendices of the guide we explored confidence interval performance for diversity estimates via simulation. The code we used to do this requires substantial computational resources and we conducted much of it on a university high performance computing cluster. Thus, we did not include the code and data in this R package (it can be found at (GITHUB LINK), and do not recommend running on a home/office computer with the sample sizes we used. However, to explore CI performance, we designed a way to simulate species abundance distributions (SADs) directly based on their Hill diversity, demonstrated below with the function fit_SAD(). In the github repository, you may see how we took the simulated abundance distributions, took random samples (with replacement), and for the random samples we computed the estimated confidence intervals (CI) given by @Chao2015. We assessed what proportion of empirical CI contained the target value: the expected sample diversity or the true (simulated) diversity of the SAD.

First we generate SADs with the function fit_SAD(). fit_SAD() uses a target Simpson diversity and distributional family to set the relative abundance for a fixed number of species, where species relative abundances are given by evenly-spaced quantiles of the continuous [gamma or log-normal] distribution. The function uses three parameters: richness, evenness, and the parametric distribution family. Richness can be any integer >0. To parameterize evenness, we used \eq{(Hill-Simpson diversity -1)/(richness-1)}, which is equal to 1 with a perfectly even species-abundance distribution, and 0 when Hill-Simpson diversity reaches its minimum, 1, for any richness value [@Chao2019a]. Hill-Simpson diversity is a positive number between 1 and richness. We used the log-normal distribution because it fits observed sample species abundance distributions well [@Enquist2019; @Baldridge2016; @Matthews2019]. We used the gamma distribution because it fits observed species abundance distributions as well [@Matthews2014; @Matthews2019], but makes different assumptions about rare species. Generating many SADs with different diversities and distributional assumptions provides scope to test proposed confidence intervals [@Beguinot2018a].

# simulate SADs for analysis
SADs_list<-map(c("lnorm", "gamma"), function(distr){
  map(c(100, 200), function(rich){
    map(c(0.05, .15,.25,.5,.75,.85), function(simp_Prop){ #simp_Prop is evenness
      fit_SAD(rich = rich, simpson = simp_Prop*(rich-1)+1, distr = distr)
    })
  })
})

The function fit_SAD() provides useful way to simulate species abundance distributions. Unlike many methods to simulate SADs that rely on finite communities, fit_SAD provides only relative abundances. This is particularly desirable for testing the statistical behavior of Hill diversities, as these also consider only relative abundances. Another feature of fit_SAD() is that the true diversity (richness and Hill-Simpson diversity), along with a parametric distributional assumption, are the inputs to the simulation, and not downstream effects of other, ancillary parameters.

fit_SAD(), like other methods, does require a sort of discrete approximation to continuous distributions. However, rather than rounding, this is simply achieved by taking the probability density of the selected distribution at evenly-spaced intervals as proportional to the relative abundances of each species.

To generate finite data for our research, samples from the simulated SADs are taken directly, without first simulating a finite pool or "community," in contrast to other commonly used methods to simulate SADs [@May2018a]. Here's a quick visualization of the SADs we just generated (this is not a figure in the MS). Rather than showing all 24, we'll look at a sample of 6: The most and least even ones, and one of the intermediate ones as well, for each of the two distributional families.

``` {r visualize SADs, fig.width = 7, fig.height = 5}

function to remove subscripts; apparently this way of making a labeller is now

deprecated

remsub <- function(variable, value){ return(gsub("_"," ",value))}

function to summarize frequencies

asab<-function(namevec){as.numeric(table(namevec))}

make rank abundance distributions

myabs<-map_dfr(flatten(flatten(SADs_list)) #, function(x) data.frame(names(x))) , function(x){data.frame(ab=x$rel_abundances)} , .id="SAD")

myabs %>% dplyr::left_join(data.frame( SAD=as.character(1:24) , skew=factor(c("uneven", "int","int", "int","int", "even"), levels=c("uneven", "int", "even")) , dist=factor(c(rep("lognormal", 12), rep("gamma", 12)), levels=c("lognormal", "gamma")) )) %>% dplyr::mutate(abD=paste(dist, skew)) %>% dplyr::group_by(SAD, abD, dist, skew) %>% dplyr::mutate(abrank = dplyr::min_rank(desc(ab)) , log_relative_abundance = log(ab) , relative_abundance = ab) %>% #for plotting, gather the two scales to use facet_grid tidyr::gather(scl, rel_abund, relative_abundance, log_relative_abundance )%>% # grab the extreme SADs and one intermediate for each distributional # assumption, just the 200 spp versions (they look the same though) dplyr::filter(SAD %in% c("7","10","12","19","22","24")) %>% # plot the RADs to show difference between gamma and lognormal, more extreme # with skew ggplot2::ggplot(aes(abrank, rel_abund, color=dist))+ ggplot2::geom_point(alpha=0.3, size=1)+ ggplot2::geom_line(size=.4, alpha=0.3)+ ggplot2::theme_classic()+ # theme(text=element_text(size=16))+ ggplot2::labs(x="abundance rank", y="", color="", shape="")+ ggplot2::facet_grid(forcats::fct_rev(scl)~skew, scales="free", switch="y", labeller=remsub) ```

In the supplementary materials for [@Roswell2021], we demonstrate that the CI for diversity estimates based on the approach articulated in [@Chao2015] tend to be anti-conservative. The code and data for this are not exported by this package; the scripts to generate all relevant data and figures are available at [github link].