GenomeEstimate: Estimate genome size and single copy region in genome

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/GenomeEstimate.R

Description

Estimated genome size is an important statistic to collect in genomic studies, especially for the evaluation purpose for de novo whole genome assembly. Besides flow cytometric analysis, genome size can also be simply estimated with comparable accuracy based on k-mer analysis using NGS short-read sequencing data.

Usage

1
GenomeEstimate(file, kmer_len)

Arguments

file

Counted kmer frequency file from jellyfish count function. The first and second columns have to be names as "frequency" and "counts" respectively.

kmer_len

The length of kmers

Details

This function takes the output from jellyfish count function and the column of the data have to be renamed into "frequency" and "counts" respectively.

The function will first detect the trust kmer starting point, and subsequently identify the mean coverage of kmer after that. It then estimate the gneome size based on the counted trusted kmer and its mean coverage.

The end point of single copy region is detected at the proximal end of bell shape distribution of kmer frequency.

Value

This function returns the mean coverage of kmers, estimated genome size based on kmer analysis, the end point of single copy region, estimated single copy region and the percentage of single copy region in genome. Among which, the mean coverage of kmers, single copy region end point are needed in PlotKmerFrequency function.

Note

N/A

Author(s)

Qiong Liu

References

More details about calculation can be referred at:

http://koke.asrc.kanazawa-u.ac.jp/HOWTO/kmer-genomesize.html

https://bioinformatics.uconn.edu/genome-size-estimation-tutorial/

See Also

Function FindTrustKmer and PlotKmerFrequency

Examples

1
2
3
4
5
6
7
# load an example data called a with kmer length 19bp
data(a)
GenomeEstimate(a,19)

# load an example data called b with kmer length 30bp
data(b)
GenomeEstimate(b,30)

qiongliu1023/GenomeSizeEstimate documentation built on May 14, 2019, 3 a.m.