estGcDistn | R Documentation |
Generate a GC content distribution from sequences for a given read length and fragment length
estGcDistn(x, n = 1e+06, rl = 100, fl = 200, fragSd = 30, bins = 101, ...)
## S4 method for signature 'ANY'
estGcDistn(x, n = 1e+06, rl = 100, fl = 200, fragSd = 30, bins = 101, ...)
## S4 method for signature 'character'
estGcDistn(x, n = 1e+06, rl = 100, fl = 200, fragSd = 30, bins = 101, ...)
## S4 method for signature 'DNAStringSet'
estGcDistn(x, n = 1e+06, rl = 100, fl = 200, fragSd = 30, bins = 101, ...)
x |
|
n |
The number of reads to sample |
rl |
Read Lengths to sample |
fl |
The mean of the fragment lengths sequenced |
fragSd |
The standard deviation of the fragment lengths being sequenced |
bins |
The number of bins to estimate |
... |
Not used |
The function takes the supplied object and returns the theoretical GC content distribution. Using a fixed read length essentially leads to a discrete distribution so the bins argument is used to define the number of bins returned. This defaults to 101 for 0 to 100% inclusive.
The returned values are obtained by interpolating the values obtained during sampling. This avoids returned distributions with gaps and jumps as would be obtained setting readLengths at values not in multiples of 100.
Based heavily on https://github.com/mikelove/fastqcTheoreticalGC
A tibble
with two columns: GC_Content
and Freq
denoting the proportion of GC and frequency of occurence reqpectively
faDir <- system.file("extdata", package = "ngsReports")
faFile <- list.files(faDir, pattern = "fasta", full.names = TRUE)
df <- estGcDistn(faFile, n = 200)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.