knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width=4.5, fig.height=3.5 )
multidog()
provides support for genotyping many SNPs by iterating flexdog()
over the SNPs. Support is provided for parallel computing through the future
package. The genotyping method is described in Gerard et al. (2018) and Gerard and Ferrão (2020).
multidog()
Let's load updog
, future
, and the data from Uitdewilligen et al. (2013).
library(future) library(updog) data("uitdewilligen")
uitdewilligen$refmat
is a matrix of reference counts while
uitdewilligen$sizemat
is a matrix of total read counts. In these data,
the rows index the individuals and the columns index the
loci. But for insertion into multidog()
we need it the other way around
(individuals in the columns and loci in the rows).
So we will transpose these matrices.
refmat <- t(uitdewilligen$refmat) sizemat <- t(uitdewilligen$sizemat) ploidy <- uitdewilligen$ploidy
sizemat
and refmat
should have the same row and column names. These names
identify the loci and the individuals.
setdiff(colnames(sizemat), colnames(refmat)) setdiff(rownames(sizemat), rownames(refmat))
If we want to do parallel computing, we should check that we have the proper number of cores:
future::availableCores()
Now let's run multidog()
:
mout <- multidog(refmat = refmat, sizemat = sizemat, ploidy = ploidy, model = "norm", nc = 2)
By default, parallelization is run using
future::plan(future::multisession, workers = nc)
if nc
is greater than 1. You can choose your own evaluation strategy by running future::plan()
prior to running multidog()
, and then setting nc = NA
. This should be particularly useful in higher performance computing environments that use schedulers, where you can control the evaluation strategy through the future.batchtools
package. For example, the following will run multidog()
using forked R processes:
future::plan(future::multicore, workers = 2) mout <- multidog(refmat = refmat, sizemat = sizemat, ploidy = ploidy, model = "norm", nc = NA) ## Shut down parallel workers future::plan(future::sequential)
multidog()
OutputThere is a plot method for the output of multidog()
.
plot(mout, indices = c(1, 5, 100))
The output of multidog contains two data frame. The first contains properties of the SNPs, such as estimated allele bias and estimated sequencing error rate.
str(mout$snpdf)
The second data frame contains properties of each individual at each SNP, such as the
estimated genotypes (geno
) and the posterior probability of being genotyping correctly (maxpostprob
).
str(mout$inddf)
You can obtain the columns in inddf
in matrix form with format_multidog()
.
genomat <- format_multidog(mout, varname = "geno") head(genomat)
To filter SNPs based on quality metrics (bias, sequencing error rate,
overdispersion, etc), you can use filter_snp()
, which uses the same
non-standard evaluation you are used to from dplyr::filter()
.
That is, you can define predicates in terms of the variable
names in the snpdf
data frame from the output of mupdog()
. It then
keeps rows in both snpdf
and inddf
where the predicate for a SNP
evaluates to TRUE
.
dim(mout$snpdf) dim(mout$inddf) mout_cleaned <- filter_snp(mout, prop_mis < 0.05 & bias > exp(-1) & bias < exp(1)) dim(mout_cleaned$snpdf) dim(mout_cleaned$inddf)
Gerard, David, and Luís Felipe Ventorim Ferrão. "Priors for genotyping polyploids." Bioinformatics 36, no. 6 (2020): 1795-1800. https://doi.org/10.1093/bioinformatics/btz852.
Gerard, David, Luís Felipe Ventorim Ferrão, Antonio Augusto Franco Garcia, and Matthew Stephens. 2018. "Genotyping Polyploids from Messy Sequencing Data." Genetics 210 (3). Genetics: 789–807. https://doi.org/10.1534/genetics.118.301468.
Uitdewilligen, Anne-Marie A. AND D’hoop, Jan G. A. M. L. AND Wolters. 2013. "A Next-Generation Sequencing Method for Genotyping-by-Sequencing of Highly Heterozygous Autotetraploid Potato." PLOS ONE 8 (5). Public Library of Science: 1–14. https://doi.org/10.1371/journal.pone.0062355.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.