FindTrustKmer: Calculate trusted and untrusted kmer from counted kmer...
In qiongliu1023/GenomeSizeEstimate: Genome Size Estimation Through Kmer Analysis

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/FindTrustKmer.R

The counted kmer frequency file calculated from whole genome sequencing data using jellyfish count function contains large amount of kmers with low frequency. These kmers are considered untrusted kmers from sequencing errors. FrindTrustKmer detect the starting point of trusted kmer and calculate the total number of untrusted and trusted kmers in the dataset.

1	FindTrustKmer(file, kmer_len)

`file`	Counted kmer frequency file from `jellyfish count` function. The first and second columns have to be named as "frequency" and "counts" respectively.
`kmer_len`	The length of kmers.

This function takes the output from jellyfish count function and the column of the data have to be renamed into "frequency" and "counts" respectively.

The function will first detect the trust kmers starting point, and subsequently calculate the total number of trusted and untrusted kmers at various frequency, thus generating the percentage of trusted and untrusted kmers in the data set.

This function will return two values. The first value is the starting point of trusted kmers. This value will be useful in the PlotKmerFrequency function. The second value is a vector which contains the total number of all kmers, trusted kmers, and untrusted kmers, as well as the percentage of trusted and untrusted kmers.

N/A

Qiong Liu

More details about calculation can be referred at:

http://koke.asrc.kanazawa-u.ac.jp/HOWTO/kmer-genomesize.html

https://bioinformatics.uconn.edu/genome-size-estimation-tutorial/

Function GenomeEstimate and PlotKmerFrequency.

# load the example data. This gives you an example data called a with kmer length 19bp
data(a)

FindTrustKmer(a,19)

# load the example data. This gives you an example data called b with kmer length 30bp
data(b)

FindTrustKmer(b,30)