FindTrustKmer: Calculate trusted and untrusted kmer from counted kmer...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/FindTrustKmer.R

Description

The counted kmer frequency file calculated from whole genome sequencing data using jellyfish count function contains large amount of kmers with low frequency. These kmers are considered untrusted kmers from sequencing errors. FrindTrustKmer detect the starting point of trusted kmer and calculate the total number of untrusted and trusted kmers in the dataset.

Usage

1
FindTrustKmer(file, kmer_len)

Arguments

file

Counted kmer frequency file from jellyfish count function. The first and second columns have to be named as "frequency" and "counts" respectively.

kmer_len

The length of kmers.

Details

This function takes the output from jellyfish count function and the column of the data have to be renamed into "frequency" and "counts" respectively.

The function will first detect the trust kmers starting point, and subsequently calculate the total number of trusted and untrusted kmers at various frequency, thus generating the percentage of trusted and untrusted kmers in the data set.

Value

This function will return two values. The first value is the starting point of trusted kmers. This value will be useful in the PlotKmerFrequency function. The second value is a vector which contains the total number of all kmers, trusted kmers, and untrusted kmers, as well as the percentage of trusted and untrusted kmers.

Note

N/A

Author(s)

Qiong Liu

References

More details about calculation can be referred at:

http://koke.asrc.kanazawa-u.ac.jp/HOWTO/kmer-genomesize.html

https://bioinformatics.uconn.edu/genome-size-estimation-tutorial/

See Also

Function GenomeEstimate and PlotKmerFrequency.

Examples

1
2
3
4
5
6
7
8
9
# load the example data. This gives you an example data called a with kmer length 19bp
data(a)

FindTrustKmer(a,19)

# load the example data. This gives you an example data called b with kmer length 30bp
data(b)

FindTrustKmer(b,30)

qiongliu1023/GenomeSizeEstimate documentation built on May 14, 2019, 3 a.m.