# Species richness estimation and modelling in the high-diversity setting

### Description

Species richness estimation is an important problem in biodiversity analysis. This package provides methods for total species richness estimation (observed plus unobserved) and a method for modelling total diversity with covariates. breakaway estimates total (observed plus unobserved) species richness. Microbial diversity datasets are characterized by a large number of rare species and a small number of highly abundant species. The class of models implemented by breakaway is flexible enough to model both these features. breakaway_nof1 implements a similar procedure however does not require a singleton count. betta provides a method for modelling total diversity with covariates in a way that accounts for its estimated nature and thus accounts for unobserved taxa, and betta_random permits random effects modelling.

### Details

Package: | breakaway |

Type: | Package |

Version: | 3.0 |

Date: | 2016-03-29 |

License: | GPL-2 |

The function `breakaway`

estimates the total (observed plus unobserved) number of classes (usually, distinct species) based on a sample of the frequency counts. Standard errors and model fits are also given. The algorithm is based on theory of characterization of distributions by ratios of their probabilities. Parameter estimation is done via nonlinear regression. The class of models available is usually broad enough to account for the high-diversity case, which is often observed in microbial diversity datasets. Since many classical estimation procedures either fail to provide an estimate or provide poor fits in the microbial setting, `breakaway`

addresses this data structure. Additionally, since sequencing errors may result in an inflated singleton count, `breakaway_nof1`

performs a similar procedure but does not require a singleton count. It can be used as an exploratory tool for investigating the plausibility of the given singleton count. `betta`

runs a regression-type analysis of estimated total diversity, thus permitting accounting for unobserved taxa. It does not enforce use of `breakaway`

for diversity estimation. A mixed-model approach accounts for the differing levels of confidence in the diversity estimates, and covariates constitute the fixed effects. Support of this work from Cornell University's Department of Statistical Sciences is gratefully acknowledged.

### Author(s)

Amy Willis & John Bunge

Maintainer: Amy Willis <adw96@cornell.edu>

### References

Willis, A. and Bunge, J. (2015). Estimating diversity via frequency ratios. *Biometrics.*

Willis, A. (2015). Species richness estimation with high diversity but spurious singletons. *Under review.*

Willis, A., Bunge, J., and Whitman, T. (2015). Inference for changes in biodiversity. *arXiv preprint.*

Rocchetti, I., Bunge, J. and Bohning, D. (2011). Population size estimation based upon ratios of recapture probabilities. *Annals of Applied Statistics*, **5**.

Chao, A. and Bunge, J. (2002). Estimating the number of species in a stochastic abundance model. *Biometrics*, **58**.

Chao, A. (1984). Nonparametric estimation of the number of classes in a population. *Scandinavian Journal of Statistics*, **4**.

### Examples

1 2 3 4 |