Description Details Author(s) References Examples

Species richness estimation is an important problem in biodiversity analysis. This package provides methods for total species richness estimation (observed plus unobserved) and a method for modelling total diversity with covariates. breakaway estimates total (observed plus unobserved) species richness. Microbial diversity datasets are characterized by a large number of rare species and a small number of highly abundant species. The class of models implemented by breakaway is flexible enough to model both these features. breakaway_nof1 implements a similar procedure however does not require a singleton count. betta provides a method for modelling total diversity with covariates in a way that accounts for its estimated nature and thus accounts for unobserved taxa, and betta_random permits random effects modelling.

Package: | breakaway |

Type: | Package |

Version: | 3.0 |

Date: | 2016-03-29 |

License: | GPL-2 |

The function `breakaway`

estimates the total (observed plus unobserved) number of classes (usually, distinct species) based on a sample of the frequency counts. Standard errors and model fits are also given. The algorithm is based on theory of characterization of distributions by ratios of their probabilities. Parameter estimation is done via nonlinear regression. The class of models available is usually broad enough to account for the high-diversity case, which is often observed in microbial diversity datasets. Since many classical estimation procedures either fail to provide an estimate or provide poor fits in the microbial setting, `breakaway`

addresses this data structure. Additionally, since sequencing errors may result in an inflated singleton count, `breakaway_nof1`

performs a similar procedure but does not require a singleton count. It can be used as an exploratory tool for investigating the plausibility of the given singleton count. `betta`

runs a regression-type analysis of estimated total diversity, thus permitting accounting for unobserved taxa. It does not enforce use of `breakaway`

for diversity estimation. A mixed-model approach accounts for the differing levels of confidence in the diversity estimates, and covariates constitute the fixed effects. Support of this work from Cornell University's Department of Statistical Sciences is gratefully acknowledged.

Amy Willis & John Bunge

Maintainer: Amy Willis <[email protected]>

Willis, A. and Bunge, J. (2015). Estimating diversity via frequency ratios. *Biometrics.*

Willis, A. (2015). Species richness estimation with high diversity but spurious singletons. *Under review.*

Willis, A., Bunge, J., and Whitman, T. (2015). Inference for changes in biodiversity. *arXiv preprint.*

Rocchetti, I., Bunge, J. and Bohning, D. (2011). Population size estimation based upon ratios of recapture probabilities. *Annals of Applied Statistics*, **5**.

Chao, A. and Bunge, J. (2002). Estimating the number of species in a stochastic abundance model. *Biometrics*, **58**.

Chao, A. (1984). Nonparametric estimation of the number of classes in a population. *Scandinavian Journal of Statistics*, **4**.

1 2 3 4 |

```
Iterative reweighting didn't produce any outcomes after the first iteration, so we use 1/x
################## breakaway ##################
The best estimate of total diversity is 1552
with std error 305
The model employed was model_1_1
The function selected was
f_{x+1}/f_{x} ~ (beta0+beta1*(x-xbar))/(1+alpha1*(x-xbar))
Coef estimates Coef std errors
beta0 1.20345571 0.16807523
beta1 0.05765149 0.02962841
alpha1 0.03012304 0.03782164
xbar 16.5$code
[1] 3
$name
[1] "model_1_1"
$para
Coef estimates Coef std errors
beta0 1.20345571 0.16807523
beta1 0.05765149 0.02962841
alpha1 0.03012304 0.03782164
$est
[1] 1552.416
$seest
[1] 304.7069
$full
Nonlinear regression model
model: lhs$y ~ structure_1_1(x, beta0, beta1, alpha1)
data: lhs
beta0 beta1 alpha1
1.20346 0.05765 0.03012
weighted residual sum-of-squares: 1.274
Number of iterations to convergence: 8
Achieved convergence tolerance: 8.57e-06
$ci
[1] 1006.52 47805.09
Iterative reweighting didn't produce any outcomes after the first iteration, so we use 1/x
################## breakaway ##################
The best estimate of total diversity is 1500
with std error 1341
The model employed was model_1_1
The function selected was
f_{x+1}/f_{x} ~ (beta0+beta1*(x-xbar))/(1+alpha1*(x-xbar))
Coef estimates Coef std errors
beta0 1.20078846 0.18102488
beta1 0.05614294 0.04800125
alpha1 0.02874889 0.05381204
xbar 16.5$table
Estimates Standard Errors p-values
[1,] 1212.358 271.4992 0
$cov
[,1]
[1,] 73711.81
$ssq_u
[1] 105923.2
$homogeneity
[1] 3.9872480 0.1362009
$global
[1] 19.93999 0.00000
$blups
[1] 1393.1890 1266.6155 977.2709
$blupses
[1] 256.2108 366.7010 189.8294
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.