Description Usage Arguments Details Value Note Author(s) References See Also Examples
View source: R/GenomeEstimate.R
Estimated genome size is an important statistic to collect in genomic studies, especially for the evaluation purpose for de novo whole genome assembly. Besides flow cytometric analysis, genome size can also be simply estimated with comparable accuracy based on k-mer analysis using NGS short-read sequencing data.
1 | GenomeEstimate(file, kmer_len)
|
file |
Counted kmer frequency file from |
kmer_len |
The length of kmers |
This function takes the output from jellyfish count
function and the column of the data have to be renamed into "frequency" and "counts" respectively.
The function will first detect the trust kmer starting point, and subsequently identify the mean coverage of kmer after that. It then estimate the gneome size based on the counted trusted kmer and its mean coverage.
The end point of single copy region is detected at the proximal end of bell shape distribution of kmer frequency.
This function returns the mean coverage of kmers, estimated genome size based on kmer analysis, the end point of single copy region, estimated single copy region and the percentage of single copy region in genome. Among which, the mean coverage of kmers, single copy region end point are needed in PlotKmerFrequency
function.
N/A
Qiong Liu
More details about calculation can be referred at:
http://koke.asrc.kanazawa-u.ac.jp/HOWTO/kmer-genomesize.html
https://bioinformatics.uconn.edu/genome-size-estimation-tutorial/
Function FindTrustKmer
and PlotKmerFrequency
1 2 3 4 5 6 7 | # load an example data called a with kmer length 19bp
data(a)
GenomeEstimate(a,19)
# load an example data called b with kmer length 30bp
data(b)
GenomeEstimate(b,30)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.