knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width=4.5, fig.height=3.5 )
We provide some example usage of the oracle calculations available in updog
. These are particularly useful for read-depth determination. These calculations are described in detail in Gerard et al. (2018).
Suppose we have a sample of tetraploid individuals derived from an S1 cross (a single generation of selfing). Using domain expertise (either from previous studies or a pilot analysis), we've determined that our sequencing technology will produce relatively clean data. That is, the sequencing error rate will not be too large (say, ~0.001), the bias will be moderate (say, ~0.7 at the most extreme), and the majority of SNPs will have reasonable levels of overdispersion (say, less than 0.01). We want to know how deep we need to sequence.
Using oracle_mis
, we can see how deep we need to sequence under the worst-case scenario we want to control (sequencing error rate = 0.001, bias = 0.7, overdispersion = 0.01) in order to obtain a misclassification error rate of at most, say, 0.05.
bias <- 0.7 od <- 0.01 seq <- 0.001 maxerr <- 0.05
Before we do this, we also need the distribution of the offspring genotypes. We can get this distribution assuming various parental genotypes using the get_q_array
function. Typically, error rates will be larger when the allele-frequency is closer to 0.5. So we'll start in the worst-case scenario of assuming that the parent has 2 copies of the reference allele.
library(updog) ploidy <- 4 pgeno <- 2 gene_dist <- get_q_array(ploidy = ploidy)[pgeno + 1, pgeno + 1, ]
This is what the genotype distribution for the offspring looks like:
library(ggplot2) distdf <- data.frame(x = 0:ploidy, y = 0, yend = gene_dist) ggplot(distdf, mapping = aes(x = x, y = y, xend = x, yend = yend)) + geom_segment(lineend = "round", lwd = 2) + theme_bw() + xlab("Allele Dosage") + ylab("Probability")
Now, we are ready to iterate through read-depth's until we reach one with an error rate less than 0.05.
err <- Inf depth <- 0 while(err > maxerr) { depth <- depth + 1 err <- oracle_mis(n = depth, ploidy = ploidy, seq = seq, bias = bias, od = od, dist = gene_dist) } depth
Looks like we need a depth of r depth
in order to get a misclassification error rate under r maxerr
.
Note that oracle_mis
returns the best misclassification error rate possible under these conditions (ploidy
= r ploidy
, bias
= r bias
, seq
= r seq
, od
= r od
, and pgeno
= r pgeno
). In your actual analysis, you will have a worse misclassification error rate than that returned by oracle_mis
. However, if you have a lot of individuals in your sample, then this will act as a reasonable approximation to the error rate. In general though, you should sequence a little deeper than suggested by oracle_mis
.
Suppose we only have a budget to sequence to a depth of 30. Then what errors can we expect? We can use oracle_joint
and oracle_plot
to visualize the errors we can expect.
depth <- 30 jd <- oracle_joint(n = depth, ploidy = ploidy, seq = seq, bias = bias, od = od, dist = gene_dist) oracle_plot(jd)
Most of the errors will be mistakes between genotypes 2/3 and mistakes between genotypes 1/2.
omiss <- oracle_mis(n = depth, ploidy = ploidy, seq = seq, bias = bias, od = od, dist = gene_dist) ocorr <- oracle_cor(n = depth, ploidy = ploidy, seq = seq, bias = bias, od = od, dist = gene_dist)
Even though the misclassification error rate is pretty high (r round(omiss, digits = 2)
), the correlation of the oracle estimator with the true genotype is pretty reasonable (r round(ocorr, digits = 2)
). You can obtain this using the oracle_cor
function.
ocorr <- oracle_cor(n = depth, ploidy = ploidy, seq = seq, bias = bias, od = od, dist = gene_dist) ocorr
Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco Garcia, and Matthew Stephens. 2018. "Genotyping Polyploids from Messy Sequencing Data." Genetics 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.