function for species richness estimation


This function implements the species richness estimation procedure outlined in Willis & Bunge (2015). The diversity estimate, standard error, estimated model coefficients, model details and plot of the fitted model are returned.


breakaway(data, print = TRUE, plot = TRUE, answers = FALSE, force = FALSE)



The sample frequency count table for the population of interest. The first row must correspond to the singletons. Acceptable formats include a matrix, data frame, or file path (csv or txt). The standard frequency count table format is used: two columns, the first of which contains the frequency of interest (eg. 1 for singletons, species observed once, 2 for doubletons, species observed twice, etc.) and the second of which contains the number of species observed this many times. Frequencies (first column) should be ordered least to greatest. At least 6 contiguous frequencies are necessary. Do not concatenate large frequencies. See dataset apples for sample formatting.


Logical: whether the results should be printed to screen. If FALSE, answers should be set to TRUE so that results will be returned.


Logical: whether the data and model fit should be plotted.


Logical: whether the function should return an argument. If FALSE, print should be set to TRUE.


Logical: force breakaway to run in the presence of frequency count concatenation. breakaway checks that the user has not concatenated multiple upper frequencies. force=TRUE will force breakaway to fit models in the presence of this. breakaway's diversity estimates cannot be considered reliable in this case.



A category representing algorithm behaviour. code=1 indicates no nonlinear models converged and the transformed WLRM diversity estimate of Rocchetti et. al. (2011) is returned. code=2 indicates that the iteratively reweighted model converged and was returned. code=3 indicates that iterative reweighting did not converge but a model based on a simplified variance structure was returned (in this case, the variance of the frequency ratios is assumed to be proportional to the denominator frequency index). Please peruse your fitted model before using your diversity estimate.


The “name” of the selected model. The first integer represents the numerator polynomial degree and the second integer represents the denominator polynomial degree of the model for the frequency ratios. See Willis & Bunge (2015) for details.


Estimated model parameters and standard errors.


The estimate of total (observed plus unobserved) diversity.


The standard error in the diversity estimate.


The chosen nonlinear model for frequency ratios.


An asymmetric 95% confidence interval for diversity.


breakaway presents an estimator of species richness that is well-suited to the high-diversity/microbial setting. However, many microbial datasets display more diversity than the Kemp-type models can permit. In this case, the log-transformed WLRM diversity estimator of Rocchetti et. al. (2011) is returned. The authors' experience suggests that some datasets that require the log-transformed WLRM contain “false” diversity, that is, diversity attributable to sequencing errors (via an inflated singleton count). The authors encourage judicious use of diversity estimators when the dataset may contain these errors, and recommend the use of breakaway_nof1 as an exploratory tool in this case.


Amy Willis


Willis, A. and Bunge, J. (2015). Estimating diversity via frequency ratios. Biometrics.

Rocchetti, I., Bunge, J. and Bohning, D. (2011). Population size estimation based upon ratios of recapture probabilities. Annals of Applied Statistics, 5.

See Also

breakaway_nof1; apples


comments powered by Disqus