Description Usage Arguments Details Value Author(s) References Examples
View source: R/sample_coverage.R
preseqR.sample.cov
predicts the probability of observing a species
represented at least r times in a random sample.
1 | preseqR.sample.cov(n, r=1, mt=20)
|
n |
A two-column matrix. The first column is the frequency j = 1,2,…; and the second column is N_j, the number of species with each species represented exactly j times in the initial sample. The first column must be sorted in an ascending order. |
r |
A positive integer. Default is 1. |
mt |
A positive integer constraining possible rational function approximations. Default is 20. |
Suppose a sample is given and one more individual is randomly drawn from the
population. preseqR.sample.cov
estimates the probability of the
species, which represents the individual, has been observed at least
r times in the
sample. When r = 1, the probability is called the sample coverage.
Let N_j be the number of species represented exactly j times in a sample. The probability of observing a species represented at least r times in the sample is estimated as ∑_{j=r+1}^∞ jN_j / ∑_{j=1}^∞ jN_j. The theory is described by Mao and Lindsay (2002). For a random sample where N_j is unknown, a modified rational function approximation is first used to predict the value of N_j. Then the estimates are substituted to obtain an estimator for the probability of observing a species represented at least r times in the sample.
This function is the fast version of preseqR.sample.cov.bootstrap
.
The function does not provide the confidence interval. To obtain the
confidence interval along with the estimates, one should use the function
preseqR.sample.cov.bootstrap
.
The estimator for the probability of observing a species represented at least r times in a random sample. The input of the estimator is a vector of sampling efforts t, i.e., the relative sample sizes comparing with the initial sample. For example, t = 2 means a random sample that is twice the size of the initial sample.
Chao Deng
Good, I. J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika, 40(3-4), 237-264.
Mao, C. X. and Lindsay, B. G. (2002). A Poisson model for the coverage problem with a genomic application. Biometrika, 89(3), 669-682.
Deng, C., Daley, T., Calabrese, P., Ren, J., & Smith, A.D. (2016). Estimating the number of species to attain sufficient representation in a random sample. arXiv preprint arXiv:1607.02804v3.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ## load library
library(preseqR)
## import data
data(FisherButterfly)
## construct the estimator for the sample coverage
estimator1 <- preseqR.sample.cov(FisherButterfly, r=1)
## Given a sample that is 10 times or 20 times the size of an initial samples,
## suppose one randomly draws one more individual from the population. The
## value of the function is the probability that the representing species
## has been observed in the sample
estimator1(c(10, 20))
## construct the estimator
estimator2 <- preseqR.sample.cov(FisherButterfly, r=2)
## the probability a species represented at least twice when the sample size
## is 50 times or 100 times of the initial sample
estimator2(c(50, 100))
|
Warning messages:
1: In polynomial(p) : imaginary parts discarded in coercion
2: In polynomial(p) : imaginary parts discarded in coercion
3: In polynomial(p) : imaginary parts discarded in coercion
4: In polynomial(p) : imaginary parts discarded in coercion
[1] 0.9996047 0.9998965
Warning messages:
1: In polynomial(p) : imaginary parts discarded in coercion
2: In polynomial(p) : imaginary parts discarded in coercion
3: In polynomial(p) : imaginary parts discarded in coercion
4: In polynomial(p) : imaginary parts discarded in coercion
[1] 0.9999492 0.9999871
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.