preseqR.sample.cov.bootstrap: Predicting generalized sample coverage with bootstrap

View source: R/sample_coverage.R

preseqR.sample.cov.bootstrapR Documentation

Predicting generalized sample coverage with bootstrap

Description

preseqR.sample.cov.bootstrap predicts the probability of observing a species represented at least r times in a random sample.

Usage

  preseqR.sample.cov.bootstrap(n, r=1, mt=20, times=30, conf=0.95)

Arguments

n

A two-column matrix. The first column is the frequency j = 1,2,…; and the second column is N_j, the number of species with each species represented exactly j times in the initial sample. The first column must be sorted in an ascending order.

r

A positive integer. Default is 1.

mt

A positive integer constraining possible rational function approximations. Default is 20.

times

The number of bootstrap samples. Default is 30.

conf

The confidence level. Default is 0.95

Details

This is the bootstrap version of preseqR.sample.cov. The bootstrap sample is generated by randomly sampling the initial sample with replacement. For each bootstrap sample, we construct an estimator. The median of estimates is used as the prediction for the number of species represented at least r times in a random sample.

The confidence interval is constructed based on a lognormal distribution.

Value

f

The estimator for the probability of observing a species represented at least r times in a sample as a function of the sample size. The input of the estimator is a vector of sampling efforts t, i.e. the relative sample sizes comparing with the initial sample. For example, t = 2 means a random sample that is twice the size of the initial sample.

se

The standard error for the estimator. The input is a vector of sampling efforts t.

lb

The lower bound of the confidence interval.The input is a vector of sampling efforts t.

ub

The upper bound of the confidence interval.The input is a vector of sampling efforts t.

Author(s)

Chao Deng

References

Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. CRC press.

Deng, C., Daley, T., Calabrese, P., Ren, J., & Smith, A.D. (2016). Estimating the number of species to attain sufficient representation in a random sample. arXiv preprint arXiv:1607.02804v3.

Examples

## load library
library(preseqR)

## import data
data(FisherButterfly)

## construct the estimator for the sample coverage
estimator1 <- preseqR.sample.cov.bootstrap(FisherButterfly, r=1)

## Given a sample that is 10 times or 20 times the size of an initial
## samples, suppose one randomly draws one more individual from the
## population. The value of the function is the probability that the
## representing species has been observed in the sample
estimator1$f(c(10, 20))

## The standard error of the estiamtes
estimator1$se(c(10, 20))

## The confidence interval of the estimates
lb <- estimator1$lb(c(10, 20))
ub <- estimator1$ub(c(10, 20))
matrix(c(lb, ub), byrow=FALSE, ncol=2)

## construct the estimator
estimator2 <- preseqR.rSAC.bootstrap(FisherButterfly, r=2)

## the probability when the sample size is 50 times or 100 times of the
## initial sample
estimator2$f(c(50, 100))

## The standard error of the estiamtes
estimator2$se(c(50, 100))

## The confidence interval of the estimates
lb <- estimator2$lb(c(50, 100))
ub <- estimator2$ub(c(50, 100))
matrix(c(lb, ub), byrow=FALSE, ncol=2)

smithlabcode/preseqR documentation built on Sept. 13, 2022, 6:29 p.m.