ztnb.rSAC: ZTNB estimator

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/ztnb.R

Description

ztnb.rSAC predicts the expected number of species represented at least r times in a random sample, based on the initial sample.

Usage

1
ztnb.rSAC(n, r=1, size=SIZE.INIT, mu=MU.INIT)

Arguments

n

A two-column matrix. The first column is the frequency j = 1,2,…; and the second column is N_j, the number of species with each species represented exactly j times in the initial sample. The first column must be sorted in an ascending order.

r

A positive integer. Default is 1.

size

A positive double, the initial value of the parameter size in the negative binomial distribution for the EM algorithm. Default value is 1.

mu

A positive double, the initial value of the parameter mu in the negative binomial distribution for the EM algorithm. Default value is 0.5.

Details

The statistical assumption is that for each species the number of individuals in a sample follows a Poisson distribution. The Poisson rate lambda are numbers generated from a gamma distribution. So the random variable X, which is the number of species represented x (x > 0) times in the sample, follows a zero-truncated negative binomial distribution. The unknown parameters are estimated by the function preseqR.ztnb.em based on the initial sample. Using the estimated distribution, we calculate the expected number of species represented at least r times in a random sample. Details of the estimation procedure can be found in the supplement of Daley T. and Smith AD. (2013).

Value

The estimator for the r-SAC. The input of the estimator is a vector of sampling efforts t, i.e., the relative sample sizes comparing with the initial sample. For example, t = 2 means a random sample that is twice the size of the initial sample.

Author(s)

Chao Deng

References

Daley, T., & Smith, A. D. (2013). Predicting the molecular complexity of sequencing libraries. Nature methods, 10(4), 325-327.

Deng C, Daley T & Smith AD (2015). Applications of species accumulation curves in large-scale biological data analysis. Quantitative Biology, 3(3), 135-144.

See Also

preseqR.ztnb.em

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## load library
library(preseqR)

## import data
data(FisherButterfly)

## construct the estimator for SAC
ztnb1 <- ztnb.rSAC(FisherButterfly, r=1)
## The number of species represented at least once in a sample, 
## when the sample size is 10 or 20 times of the initial sample
ztnb1(c(10, 20))

## construct the estimator for r-SAC
ztnb2 <- ztnb.rSAC(FisherButterfly, r=2)
## The number of species represented at least twice in a sample, 
## when the sample size is 50 or 100 times of the initial sample
ztnb2(c(50, 100))

preseqR documentation built on May 2, 2019, 6:39 a.m.