knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width=4.5, fig.height=3.5 )

We provide some example usage of the oracle calculations available in `updog`

. These are particularly useful for read-depth determination. These calculations are described in detail in Gerard et al. (2018).

Suppose we have a sample of tetraploid individuals derived from an S1 cross (a single generation of selfing). Using domain expertise (either from previous studies or a pilot analysis), we've determined that our sequencing technology will produce relatively clean data. That is, the sequencing error rate will not be too large (say, ~0.001), the bias will be moderate (say, ~0.7 at the most extreme), and the majority of SNPs will have reasonable levels of overdispersion (say, less than 0.01). We want to know how deep we need to sequence.

Using `oracle_mis`

, we can see how deep we need to sequence under the worst-case scenario we want to control (sequencing error rate = 0.001, bias = 0.7, overdispersion = 0.01) in order to obtain a misclassification error rate of at most, say, 0.05.

bias <- 0.7 od <- 0.01 seq <- 0.001 maxerr <- 0.05

Before we do this, we also need the distribution of the offspring genotypes. We can get this distribution assuming various parental genotypes using the `get_q_array`

function. Typically, error rates will be larger when the allele-frequency is closer to 0.5. So we'll start in the worst-case scenario of assuming that the parent has 2 copies of the reference allele.

library(updog) ploidy <- 4 pgeno <- 2 gene_dist <- get_q_array(ploidy = ploidy)[pgeno + 1, pgeno + 1, ]

This is what the genotype distribution for the offspring looks like:

library(ggplot2) distdf <- data.frame(x = 0:ploidy, y = 0, yend = gene_dist) ggplot(distdf, mapping = aes(x = x, y = y, xend = x, yend = yend)) + geom_segment(lineend = "round", lwd = 2) + theme_bw() + xlab("Allele Dosage") + ylab("Probability")

Now, we are ready to iterate through read-depth's until we reach one with an error rate less than 0.05.

err <- Inf depth <- 0 while(err > maxerr) { depth <- depth + 1 err <- oracle_mis(n = depth, ploidy = ploidy, seq = seq, bias = bias, od = od, dist = gene_dist) } depth

Looks like we need a depth of `r depth`

in order to get a misclassification error rate under `r maxerr`

.

Note that `oracle_mis`

returns the **best misclassification error rate possible** under these conditions (`ploidy`

= `r ploidy`

, `bias`

= `r bias`

, `seq`

= `r seq`

, `od`

= `r od`

, and `pgeno`

= `r pgeno`

). In your actual analysis, you will have a worse misclassification error rate than that returned by `oracle_mis`

. However, if you have a lot of individuals in your sample, then this will act as a reasonable approximation to the error rate. In general though, you should sequence a little deeper than suggested by `oracle_mis`

.

Suppose we only have a budget to sequence to a depth of 30. Then what errors can we expect? We can use `oracle_joint`

and `oracle_plot`

to visualize the errors we can expect.

depth <- 30 jd <- oracle_joint(n = depth, ploidy = ploidy, seq = seq, bias = bias, od = od, dist = gene_dist) oracle_plot(jd)

Most of the errors will be mistakes between genotypes 2/3 and mistakes between genotypes 1/2.

omiss <- oracle_mis(n = depth, ploidy = ploidy, seq = seq, bias = bias, od = od, dist = gene_dist) ocorr <- oracle_cor(n = depth, ploidy = ploidy, seq = seq, bias = bias, od = od, dist = gene_dist)

Even though the misclassification error rate is pretty high (`r round(omiss, digits = 2)`

), the correlation of the oracle estimator with the true genotype is pretty reasonable (`r round(ocorr, digits = 2)`

). You can obtain this using the `oracle_cor`

function.

ocorr <- oracle_cor(n = depth, ploidy = ploidy, seq = seq, bias = bias, od = od, dist = gene_dist) ocorr

Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco Garcia, and Matthew Stephens. 2018. "Genotyping Polyploids from Messy Sequencing Data." *Genetics* 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468.

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.