knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) set.seed(1)
library(segtest)
We provide a walk-through of how to run tests for segregation distortion at many loci in parallel. Details of the methods may be found in Gerard et al. (2025a) and Gerard et al. (2025b).
Data sets ufit
, ufit2
, and ufit3
contain the genotyping output of updog::multidog()
using three different models.
ufit
: Uses the norm
model without specifying who the parents are.ufit2
: Uses the f1pp
model, specifying the parents.ufit3
: Uses the f1
model, specifying the parents.You can convert this genotyping output to what seg_multi()
expects using multidog_to_g()
.
If you did not use the f1pp
or f1
models, use ether the all_gl
(to run tests using genotype log-likelihoods) or all_g
(to run tests assuming genotypes are known) options, and you must specify the ID's of the parents.
o1 <- multidog_to_g(ufit, ploidy = 4, type = "all_g", p1 = "indigocrisp", p2 = "sweetcrisp") o2 <- multidog_to_g(ufit, ploidy = 4, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp")
If you did use the f1pp
or f1
models, use either the off_gl
(to run tests using genotype log-likelihoods) or off_g
(to run tests assuming genotypes are known) options.
o3 <- multidog_to_g(ufit2, ploidy = 4, type = "off_g") o4 <- multidog_to_g(ufit2, ploidy = 4, type = "off_gl") o5 <- multidog_to_g(ufit3, ploidy = 4, type = "off_g") o6 <- multidog_to_g(ufit3, ploidy = 4, type = "off_gl")
We would recommend always using genotype log-likelihoods. But the option is there for known genotypes.
Parallelization support is provided through the future package. You specify how to implement parallelization using future::plan()
. Then you run seg_multi()
. Then you shut down the parallelization with future::plan()
. The most common plan is future::multisession()
, where you specify the number of parallel processes with the workers
argument. You can get the maximum number of workers via future::availableCores()
future::availableCores()
So a typically workload looks something like this:
future::plan(future::multisession(workers = 2)) mout1 <- seg_multi( g = o2$g, p1 = o2$p1, p2 = o2$p2, p1_ploidy = 4, p2_ploidy = 4, ret_out = TRUE) future::plan(future::sequential())
future::plan(future::sequential()) mout1 <- seg_multi( g = o2$g, p1 = o2$p1, p2 = o2$p2, p1_ploidy = 4, p2_ploidy = 4, ret_out = TRUE)
The output is a data frame. The most important parts of this are the snp
and p_value
columns.
mout1[c("snp", "p_value")]
It looks like only SNP 12_31029646
is in possible segregation distortion. Let's compare the expected frequencies (from the q0
column) against the observed modes.
round(mout1$q0[mout1$snp == "12_31029646"][[1]], digits = 4) o1$g["12_31029646", ] / sum(o1$g["12_31029646", ])
So SNP 12_31029646
likely got flagged because there are too many individuals with a genotype of 2, and not enough with a genotype of 0.
If we would have run outlier = FALSE
, then SNP 2_20070837
would also have been flagged for possible segregation distortion
future::plan(future::multisession(workers = 2)) mout2 <- seg_multi( g = o2$g, p1 = o2$p1, p2 = o2$p2, p1_ploidy = 4, p2_ploidy = 4, outlier = FALSE) future::plan(future::sequential())
future::plan(future::sequential()) mout2 <- seg_multi( g = o2$g, p1 = o2$p1, p2 = o2$p2, p1_ploidy = 4, p2_ploidy = 4, outlier = FALSE)
mout2[, c("snp", "p_value")]
Let's look at the posterior mode genotypes of SNP 2_20070837
:
o1$g["2_20070837", ] o1$p1["2_20070837"] o1$p2["2_20070837"]
So SNP 2_20070837
likely got flagged because of two individuals that are very likely a genotype of 3, which is impossible if the parents have genotypes 0 and 2. Including outlier = TRUE
previously made it so that we would ignore this discrepancy (up to a point). You can get the posterior probability that an individual is an outlier if you set ret_out = TRUE
(which I previously did).
graphics::plot( mout1$outprob[mout1$snp == "2_20070837"][[1]], ylim = c(0, 1), ylab = "Outlier Probability")
multi_lrt()
The older function multi_lrt()
, only provides support for tetraploids. You can still use it instead of seg_multi()
for tetraploids.
future::plan(future::multisession(workers = 2)) mout3 <- multi_lrt( g = o2$g, p1 = o2$p1, p2 = o2$p2 ) future::plan(future::sequential())
future::plan(future::sequential()) mout3 <- multi_lrt( g = o2$g, p1 = o2$p1, p2 = o2$p2 )
It's a little faster, but it cannot account for outliers, so it will mostly be the same as setting outlier = FALSE
.
plot( mout2$p_value, mout3$p_value, xlab = "seg_multi(outlier = FALSE)", ylab = "multi_lrt()") abline(0, 1, lty = 2, col = 2)
The one discrepancy above is caused by the new way we calculate the degrees of freedom:
i <- which.max(abs(mout2$p_value - mout3$p_value)) mout2$df[[i]] mout3$df[[i]]
The two approaches should give almost equivalent results most of the time, but the new way is a little better.
Gerard D, Thakkar M, & Ferrão LFV (2025a). "Tests for segregation distortion in tetraploid F1 populations." Theoretical and Applied Genetics, 138(30), p. 1--13. doi:10.1007/s00122-025-04816-z.
Gerard, D, Ambrosano, GB, Pereira, GdS, & Garcia, AAF (2025b). "Tests for segregation distortion in higher ploidy F1 populations." bioRxiv, p. 1--20. bioRxiv:2025.06.23.661114
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.