# preseqR.sample.cov: Predicting generalized sample coverage In preseqR: Predicting Species Accumulation Curves

## Description

`preseqR.sample.cov` predicts the probability of observing a species represented at least r times in a random sample.

## Usage

 `1` ``` preseqR.sample.cov(n, r=1, mt=20) ```

## Arguments

 `n` A two-column matrix. The first column is the frequency j = 1,2,…; and the second column is N_j, the number of species with each species represented exactly j times in the initial sample. The first column must be sorted in an ascending order. `r` A positive integer. Default is 1. `mt` A positive integer constraining possible rational function approximations. Default is 20.

## Details

Suppose a sample is given and one more individual is randomly drawn from the population. `preseqR.sample.cov` estimates the probability of the species, which represents the individual, has been observed at least r times in the sample. When r = 1, the probability is called the sample coverage.

Let N_j be the number of species represented exactly j times in a sample. The probability of observing a species represented at least r times in the sample is estimated as ∑_{j=r+1}^∞ jN_j / ∑_{j=1}^∞ jN_j. The theory is described by Mao and Lindsay (2002). For a random sample where N_j is unknown, a modified rational function approximation is first used to predict the value of N_j. Then the estimates are substituted to obtain an estimator for the probability of observing a species represented at least r times in the sample.

This function is the fast version of `preseqR.sample.cov.bootstrap`. The function does not provide the confidence interval. To obtain the confidence interval along with the estimates, one should use the function `preseqR.sample.cov.bootstrap`.

## Value

The estimator for the probability of observing a species represented at least r times in a random sample. The input of the estimator is a vector of sampling efforts t, i.e., the relative sample sizes comparing with the initial sample. For example, t = 2 means a random sample that is twice the size of the initial sample.

Chao Deng

## References

Good, I. J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika, 40(3-4), 237-264.

Mao, C. X. and Lindsay, B. G. (2002). A Poisson model for the coverage problem with a genomic application. Biometrika, 89(3), 669-682.

Deng, C., Daley, T., Calabrese, P., Ren, J., & Smith, A.D. (2016). Estimating the number of species to attain sufficient representation in a random sample. arXiv preprint arXiv:1607.02804v3.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19``` ```## load library library(preseqR) ## import data data(FisherButterfly) ## construct the estimator for the sample coverage estimator1 <- preseqR.sample.cov(FisherButterfly, r=1) ## Given a sample that is 10 times or 20 times the size of an initial samples, ## suppose one randomly draws one more individual from the population. The ## value of the function is the probability that the representing species ## has been observed in the sample estimator1(c(10, 20)) ## construct the estimator estimator2 <- preseqR.sample.cov(FisherButterfly, r=2) ## the probability a species represented at least twice when the sample size ## is 50 times or 100 times of the initial sample estimator2(c(50, 100)) ```

### Example output

```Warning messages:
1: In polynomial(p) : imaginary parts discarded in coercion
2: In polynomial(p) : imaginary parts discarded in coercion
3: In polynomial(p) : imaginary parts discarded in coercion
4: In polynomial(p) : imaginary parts discarded in coercion
 0.9996047 0.9998965
Warning messages:
1: In polynomial(p) : imaginary parts discarded in coercion
2: In polynomial(p) : imaginary parts discarded in coercion
3: In polynomial(p) : imaginary parts discarded in coercion
4: In polynomial(p) : imaginary parts discarded in coercion
 0.9999492 0.9999871
```

preseqR documentation built on May 2, 2019, 6:39 a.m.