kmer.frac.curve | R Documentation |
kmer.frac.curve
predicts the expected fraction of k-mers observed at
least r times in a high-throughput sequencing experiment given the
amount of sequencing
kmer.frac.curve(n, k, read.len, seq, r=2, mt=20)
n |
A two-column matrix. The first column is the frequency j = 1,2,…; and the second column is N_j, the number of k-mers observed exactly j times in the initial experiment. The first column must be sorted in an ascending order. |
k |
The number of nucleotides in a k-mer. |
read.len |
The average length of a read. |
seq |
The amount of nucleotides sequenced.. |
r |
A positive integer. Default is 1. |
mt |
An positive integer constraining possible rational function approximations. Default is 20. |
kmer.frac.curve
is mainly designed for metagenomics to evaluate how
saturated a metagenomic data is.
kmer.frac.curve
is the fast version of kmer.frac.curve.bootstrap
.
The function does not provide the confidence interval. To obtain the
confidence interval along with the estimates, one should use the function
kmer.frac.curve.bootstrap
.
A two-column matrix. The first column is the amount of sequencing in an experiment. The second column is the estimate of the fraction of k-mers observed at least r times in the experiment.
Chao Deng
Deng, C and Smith, AD (2016). Estimating the number of species to attain sufficient representation in a random sample. arXiv preprint arXiv:1607.02804
## load library library(preseqR) ## import data data(SRR061157_k31) ## the fraction of 31-mers represented at least 10 times in an experiment when ## sequencing 1M, 10M, 100M, 1G, 10G, 100G, 1T nucleotides kmer.frac.curve(n=SRR061157_k31, k=31, read.len=100, seq=10^(6:12), r=10, mt=20)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.