preseqR.rSAC.sequencing.rmdup | R Documentation |
preseqR.rSAC.sequencing.rmdup
predicts the expected number of
nucleotides in the genome sequenced at least r times in a sequencing
experiment, based on a shallow sequencing experiment.
preseqR.rSAC.sequencing.rmdup(n_base, n_read, r=1, mt=20, times=30, conf=0.95)
n_base |
A two-column matrix. The first column is the frequency j = 1,2,…; and the second column is N_j, the number of nucleotides in the genome sequenced exactly j times in the initial experiment. The first column must be sorted in an ascending order. |
n_read |
A two-column matrix. The first column is the frequency j = 1,2,…; and the second column is N'_j, the number of distinct reads with exactly j duplicates in the initial experiment. The first column must be sorted in an ascending order. |
r |
A positive integer. Default is 1. |
mt |
An positive integer constraining possible rational function approximations. Default is 20. |
times |
The number of bootstrap samples. Default is 30. |
conf |
The confidence level. Default is 0.95 |
preseqR.rSAC.sequencing.rmdup
is designed for sequencing experiments,
where duplicate reads are removed. The procedure is commonly used in
whole-exome sequencing experiments and sometimes appeared in WGS as well.
To use the function, one must have two histograms. The first histogram
is the coverage histogram, which is based on distinct reads.
The second histogram is the counts of reads with exactly j duplicates.
f |
The estimator for the expected number of nucleotides in the genome sequenced at least r times given the amount of sequencing. The input of the estimator is a vector of sequencing efforts t, i.e. the relative amount of sequencing comparing with the amount in the initial experiment. For example, t = 2 means sequencing twice the amount of the initial experiment. |
se |
The standard error for the estimator. The input is a vector of sequencing efforts t. |
lb |
The lower bound of the confidence interval.The input is a vector of sequencing efforts t. |
ub |
The upper bound of the confidence interval.The input is a vector of sequencing efforts t. |
Chao Deng
Deng, C., Daley, T., Calabrese, P., Ren, J., & Smith, A.D. (2016). Estimating the number of species to attain sufficient representation in a random sample. arXiv preprint arXiv:1607.02804v3.
## load library library(preseqR) ## import data data(SRR1301329_1M_base) data(SRR1301329_1M_read) ## construct the estimator estimator1 <- preseqR.rSAC.sequencing.rmdup( n_base=SRR1301329_1M_base, n_read=SRR5365359_5M_read, r=4, mt=20, times=100, conf=0.95) ## The number of nucleotides in the genome covered at least 4 times, ## when the amount of sequencing is 10 or 20 times of the intial ## experiment 10 or 20 times of the initial sample estimator1$f(c(10, 20)) ## The standard error of the estiamtes estimator1$se(c(10, 20)) ## The confidence interval of the estimates lb <- estimator1$lb(c(10, 20)) ub <- estimator1$ub(c(10, 20)) matrix(c(lb, ub), byrow=FALSE, ncol=2) # construct the estimator estimator2 <- preseqR.rSAC.sequencing.rmdup( n_base=SRR1301329_1M_base, n_read=SRR5365359_5M_read, r=10, mt=20, times=100, conf=0.95) ## The number of nucleotides in the genome covered at least 10 times, ## when the amount of sequencing is 10 or 20 times of the intial ## experiment 10 or 20 times of the initial sample estimator2$f(c(10, 20)) ## The standard error of the estiamtes estimator2$se(c(10, 20)) ## The confidence interval of the estimates lb <- estimator2$lb(c(10, 20)) ub <- estimator2$ub(c(10, 20)) matrix(c(lb, ub), byrow=FALSE, ncol=2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.