View source: R/sample_coverage.R
| preseqR.sample.cov | R Documentation |
preseqR.sample.cov predicts the probability of observing a species
represented at least r times in a random sample.
preseqR.sample.cov(n, r=1, mt=20)
n |
A two-column matrix. The first column is the frequency j = 1,2,…; and the second column is N_j, the number of species with each species represented exactly j times in the initial sample. The first column must be sorted in an ascending order. |
r |
A positive integer. Default is 1. |
mt |
A positive integer constraining possible rational function approximations. Default is 20. |
Suppose a sample is given and one more individual is randomly drawn from the
population. preseqR.sample.cov estimates the probability of the
species, which represents the individual, has been observed at least
r times in the
sample. When r = 1, the probability is called the sample coverage.
Let N_j be the number of species represented exactly j times in a sample. The probability of observing a species represented at least r times in the sample is estimated as ∑_{j=r+1}^∞ jN_j / ∑_{j=1}^∞ jN_j. The theory is described by Mao and Lindsay (2002). For a random sample where N_j is unknown, a modified rational function approximation is first used to predict the value of N_j. Then the estimates are substituted to obtain an estimator for the probability of observing a species represented at least r times in the sample.
This function is the fast version of preseqR.sample.cov.bootstrap.
The function does not provide the confidence interval. To obtain the
confidence interval along with the estimates, one should use the function
preseqR.sample.cov.bootstrap.
The estimator for the probability of observing a species represented at least r times in a random sample. The input of the estimator is a vector of sampling efforts t, i.e., the relative sample sizes comparing with the initial sample. For example, t = 2 means a random sample that is twice the size of the initial sample.
Chao Deng
Good, I. J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika, 40(3-4), 237-264.
Mao, C. X. and Lindsay, B. G. (2002). A Poisson model for the coverage problem with a genomic application. Biometrika, 89(3), 669-682.
Deng, C., Daley, T., Calabrese, P., Ren, J., & Smith, A.D. (2016). Estimating the number of species to attain sufficient representation in a random sample. arXiv preprint arXiv:1607.02804v3.
## load library library(preseqR) ## import data data(FisherButterfly) ## construct the estimator for the sample coverage estimator1 <- preseqR.sample.cov(FisherButterfly, r=1) ## Given a sample that is 10 times or 20 times the size of an initial samples, ## suppose one randomly draws one more individual from the population. The ## value of the function is the probability that the representing species ## has been observed in the sample estimator1(c(10, 20)) ## construct the estimator estimator2 <- preseqR.sample.cov(FisherButterfly, r=2) ## the probability a species represented at least twice when the sample size ## is 50 times or 100 times of the initial sample estimator2(c(50, 100))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.