Meiosis is a lean R package for the simulation of meiosis events in diploid
(or allopolyploid) plant species for genetic research in plant breeding.
Meiosis offers two different but complementary approaches for the simulation
of meiosis events that are based on different representations of the genomic
data. The first representation, which we named genotypic, simply uses integer
coding for the alleles at various loci of the genomic data. For instance, the alleles of a bi-alleleic marker are often encoded with 0 and 1.
Meiosis is then simulated with the function meiosis_geno
and sequences of
integers representing parental gametes are recombined into a new gamete.
The seconds representation, which we herein call segmental, encodes genomic
data in terms of founder alleles and segment borders (see Cheng et al.
(2015)^[XSim]). A founder allele can be
thought of as a label that identifies a chunk of DNA in terms of its
origin from so-called founder individuals. Segment borders define
the beginning and the end of such a chunk. Founder individuals are created
with to the function create_xo_founder
and are characterized by two unique
integers that code for the founder alleles they carry. Meiosis is then simulated by the
function meiosis_xo
, where the founder alleles and segment data of two gametes are
suitably combined to form a new gamete. In order to convert genomic data from
the segmental to the genotypic representation, it is first necessary to create
a object of type Converter
(reference class) and assign to it the genotypes of all
founder individuals via the method insert_founder
. Thereafter, the conversion
is performed by the method convert
. All this is illustrated in examples below.
For simulating meiosis, we first need to simulate crossovers. This is
done internally, but requires a list of parameters. This list can be created
with the function create_xoparam
and should not be modified by the user.
Meiosis is generally simulated by the functions meiosis_geno
and meiosis_xo
.
However, shallow wrappers were added from simulating a cross (cross_geno
,
cross_xo
), a selfing (self_geno
, self_xo
) and a doubled haploid
(dh_geno
, dh_xo
) for convenience and code-expressiveness. However, in
general the philosophy of Meiosis is to be as low-level a possible and to
provide basic functionality with little overhead.
A list with crossover parameters is needed for the simulation of crossover
locations, which is done internally. This list should not be produced by
hand, but rather by the function create_xoparam
. It contains parameters
related to chromosome lengths and (possible) crossover interference.
An individual is represented by a nested list, where only its lowest level differs between the genotypic and the segmental representation. In general, an individual is a list containing exactly two gametes, the paternal and the maternal gamete. A gamete is itself a list of chromosomes. Only at this level, the two representations diverge. For the genotypic representation, a chromosome is simply an integer vector. For the segmental representation, it is two vectors of equal length, where the first vector is an integer vector containing founder alleles and the second vector is a numeric (double) vector containing the end points of segments in centiMorgan (cM).
A founder individual is only important for simulations using
the segmental representation. Here, a founder individual has the same
structure as a conventional individual, but there is only one founder allele on
each chromosome that is the same for all chromosomes on a gamete. The data on
segments are only the end points of the chromosomes. Founder individuals are the
starting point for simulations using the function meiosis_xo
.
The "data structure" positions is simply a list that contains vectors with the
genetic map, indicating the positions of the respective locus. There is one
such vector for each chromosome. Map positions are given in centiMorgan, must be
strictly increasing and the first position must be non-negative. Positons are
required when simulating meiosis with meiosis_geno
(genotypic
representation) or when converting from segmental to genotypic.
A converter is a reference class for converting data from the
segmental to the genotypic representation. It has only two methods, namely
insert_founder
for inserting founder genotypes and convert
for actually
converting the data.
Meiosis provides some convenience functions for checking the integrity of
the data structures, like check_positions
, check_xo_individual
and
check_geno_individual
, but the functions, in general, do not perform any checks
themselves and will end up with undefined behaviour if called with invalid
arguments.
The critical user might ask the legitimate question why there are two different
representations of genomic data. The reason is two-fold. First, different
representations have different strengths in terms of computation time and memory
requirements. In general, the genotypic representation will be advantageous
compared to the segmental one when (i) you don't want to bother with the more
complicated segmental representation, (ii) you only simulate few generations
in a breeding program or you always need genotypic data even for intermediate
generations, (iii) you are only dealing with low-density genomic data. On the
other hand, the segmental representation will be (possibly much) more
efficient when using high-density genomic data and when doing simulations where
genotypes of (intermediate) individuals are not always needed. An example would
be the simulation of self-fertilization for a couple of generations, were you
only care for the genotypes of the inbred lines at the very end. Another
advantage of the segmental representation is that it allows you to compute a
"realized" coefficient of co-ancestry, which is defined here as the probability
that at a random position on the genome, an allele drawn from one individual is
identical by descent (i.e., shows the same founder allele) to a randomly drawn
allele from a second individual. This can be done by the function
realized_coancestry
and allows for computing relationship coefficients that
capture deviations from additive genetic relationships due to Mendelian
sampling.
Meiosis does not currently consider the possibility of different sexes in terms of gonosomes and, hence, is not suited for simulations in animal breeding. In case of dioecious plants, it is the responsibility of the user to manage the matings.
Meiosis does not take into account mutation events. There are various reasons for this. First, Meiosis is mainly inteded for the simulation of plant breeding programs, where the number of generations is limited and, hence, mutations will play a minor role. Second, considering simulation events is only easy in case of point-mutations, where an allele is replace by another and all allelic states are known beforehand. Third, if the user wishes to incorporate mutations, it is easy to do so after the simulation of meiosis.
library(knitr) opts_chunk$set(tidy.opts = list(width.cutoff = 80), tidy = TRUE)
The C++ routines use an independent random number generator. For seeding it, do e.g.
library('Meiosis') set.seed(123L) ## Seed R's rng Meiosis::seed_rng(seed = 123L) ## Seed rng used by Meiosis
For (internally) simulating crossover events, the functions that are used to simulate meiosis
events accept a special list as parameter. This list is created by the function
create_xoparam
, which accepts as obligatory parameter a vector that contains the lengths
of the chromosomes. Below, we randomly sample the lengths of three chromosomes and create the
list containing the crossover parameters. See the documentation of create_xoparam
for further
parameters.
n_chr <- 3L ## number of chromosomes L <- runif(n = n_chr, min = 100, max = 300) ## sample length of chromosomes in cM xoparam <- create_xoparam(L) ## no interference, no obligate chiasma str(xoparam)
For the examples below, we need to simulate some genomic data. We sample the number of loci per chromosome as well as the positions of these loci on the respective chromosome.
n_loci <- round(runif(n = n_chr, min = 5L, max = 10L)) ## sample number of loci ## sample positions of loci on the chromosome positions <- lapply(seq_len(n_chr), function(i) sort(runif(n_loci[i], min = 0, max = L[i])))
This example shows how to simulate meiosis with data in the "genotypic" representation. We first
simulate some genotypic data of an individual and then call the function cross_geno
for
simulating meiosis. In this case, the output is a selfing progeny of ind
.
ind <- replicate(2L, lapply(n_loci, function(n) sample(c(0L, 1L), n, replace = TRUE)), simplify = FALSE) ## simulate some genotypic data str(ind) p_geno <- Meiosis::cross_geno(father = ind, mother = ind, positions = positions, xoparam = xoparam) str(p_geno)
Here, I show how to do the same as in Example 1, but with data in the segmental
representation. We first have to create one or multiple founder
individuals. Each founder individual has two (distinct!) founder alleles. These
founder alleles are represented by arbitrary, but unique integers. Think about
these founder alleles as "tags" that are attached to chromosomal segments.
As these segments are "dropped down" the pedigree and sometimes altered by a
recombination event, the "tags" always guarantee that each chunk of the genome can
be unambiguously assigned to one of the founder individuals. A founder
individual is created with the function create_xo_founder
, which accepts as
parameters the integer labels and a vector that specifies the length of each
chromosome in cM. A selfing is then simulated by a call to cross_xo
.
Because data in the "segmental" representation are not useful for many purposes,
we usually will convert them (back) to the "genotypic" representation. To do so,
we first have to create a Converter
object. This is a special data structures
(reference class), which has two methods: insert_founder
and convert
. Before
starting with the conversion, it is necessary to tell the Converter
object the
genotypes of the founder individuals. This is done via the method
insert_founder
. The genotypic data of each founder individual that was
involved in the generation of progenty has to be added with a call to this
method, otherwise an error will be thrown later.
f_alleles <- c(21L, 65L) ## 21 and 65 are arbitrary integers f <- Meiosis::create_xo_founder(alleles = f_alleles, L = L) p_xo <- Meiosis::cross_xo(father = f, mother = f, xoparam = xoparam) str(p_xo)
Create a converter to convert from the segmental to the genotypic representation.
conv <- new(Meiosis::Converter, positions) ## create a new converter object conv$insert_founder(f_alleles, ind) ## insert the (one and only) founder str(conv$convert(p_xo)) ## convert the progeny
In this example, I show how te derive a number of inbred lines from a cross between two parents. The first part of the examples shows how to achieve this using the "genotypic" representation and the second part shows the "segmental" case.
n_self <- 10L ## number of generations of selfing n <- 30L ## number of progeny ## Genotypic representation ind2 <- replicate(2L, lapply(n_loci, function(n) sample(c(0L, 1L), n, replace = TRUE)), simplify = FALSE) # Second individual as parent. pop <- replicate(n, Meiosis::cross_geno(ind, ind2, positions, xoparam), simplify = FALSE) for (i in seq_len(n_self)) { for (j in seq_len(n)) { pop[[i]] <- Meiosis::cross_geno(pop[[i]], pop[[i]], positions, xoparam) } } ## Segmental representation f2 <- create_xo_founder(alleles = c(55L, 77L), L = L) pop_xo <- replicate(n, Meiosis::cross_xo(f, f2, xoparam), simplify = FALSE) for (i in seq_len(n_self)) { for (j in seq_len(n)) { pop_xo[[i]] <- Meiosis::cross_xo(pop_xo[[i]], pop_xo[[i]], xoparam) } } # conv$convert(pop[[1]]) ## error, because genotypic data of second founder not present conv$insert_founder(c(55L, 77L), ind2) ## insert second founder first pop_geno <- lapply(pop_xo, conv$convert) ## convert whole population
Alternatively, we could have used the functions Meiosis::self_geno
and Meiosis::self_xo
,
which are wrappers around Meiosis::cross_geno
and Meiosis::cross_xo
, for producing the
selfings.
In this example, we create a synthetic population (segmental representation) by crossing a number of founder individuals and subsequent random mating for several generations.
make_synthetic <- function(founder, n_ind, n_gen) { ## Cross parents n_founder <- length(founder) tmp <- combn(x = seq_len(n_founder), m = 2L) combinations <- split(tmp, col(tmp)) pop_xo <- replicate(n = n_ind, simplify = FALSE, { pair <- unlist(sample(combinations, size = 1L)) cross_xo(founder[[pair[1L]]], founder[[pair[2L]]], xoparam) }) ## Random mating for (i in seq_len(n_gen)) { pop_xo_new <- pop_xo ## copy for (j in seq_len(n_ind)) { pair <- sample(n_ind, size = 2L, replace = TRUE) ## selfing possible pop_xo_new[[j]] <- cross_xo(pop_xo[[pair[1L]]], pop_xo[[pair[2L]]], xoparam) } pop_xo <- pop_xo_new ## swap } pop_xo } n_founder <- 5L ## number of founders n_ind <- 100L ## size of synthetic n_gen <- 10L # generations of random mating alleles <- lapply(seq_len(n_founder), function(i) c(2L * i - 1L, 2L * i)) founder <- lapply(alleles, create_xo_founder, L = L) ## Create synthetic system.time(syn <- make_synthetic(founder, n_ind, n_gen))
Here, we calculate realized coefficient of co-ancestry. These are analogous to the classical co-ancestry coefficients but take into account variation due to Mendelian sampling.
## Calculate realized coefficients of co-ancestry. Meiosis::realized_coancestry(f) Meiosis::realized_coancestry(p_xo) ## selfing progeny, expected coefficient is 0.75.
Here, we verify that the realized coefficients of co-ancestry correspond, on average, to expected coefficients, which are half of the additive genetic relationships between individuals. We therefore simulate a small pedigree and used the package pedigree to compute additive genetic relationships.
library('pedigree') ## Create a simple pedigree id <- 1:6 dam <- c(0, 0, 1, 1, 4, 4) sire <- c(0, 0, 2, 2, 3, 5) ped <- data.frame(id, dam, sire) ## Compute the additive genetic relationship matrix and coefficients of co-ancestry cwd <- getwd() tpdir <- tempdir() setwd(tpdir) invisible(makeA(ped, which = rep(TRUE, length(id)))) coanc <- read.table("A.txt") setwd(cwd) A <- matrix(NA_real_, nrow = length(id), ncol = length(id)) A[as.matrix(coanc[1:2])] <- A[as.matrix(coanc[2:1])] <- coanc[[3]] eCoc <- A / 2 ## expected coefficients of co-ancestry ## Helper function for simulating pedigree and computing realized coefficients of co-ancestry. sim_ped <- function() { f1 <- create_xo_founder(c(1L, 2L), L) f2 <- create_xo_founder(c(3L, 4L), L) i1 <- cross_xo(f1, f2, xoparam) i2 <- cross_xo(f1, f2, xoparam) i3 <- cross_xo(i1, i2, xoparam) i4 <- cross_xo(i2, i3, xoparam) tmp <- list(f1, f2, i1, i2, i3, i4) C <- matrix(data = NA_real_, nrow = length(id), ncol = length(id)) for (i in seq_along(id)) for (j in i:length(id)) C[i, j] <- C[j, i] <- realized_coancestry(tmp[[i]], tmp[[j]]) C } ## Verify that, on average, the realized coefficients are equal to the expected coefficients. n <- 1000L ## number of replicates rCoc_avg <- Reduce(f = `+`, x = replicate(n, sim_ped(), simplify = FALSE)) / n ## take average plot(as.vector(rCoc_avg), as.vector(eCoc)); abline(0, 1)
We can also produce a doubled haploid progeny, both from the "segmental" and the "genotypic" representation.
## Simulate a doubled haploid individual. str(Meiosis::dh_geno(ind, positions, xoparam)) str(conv$convert(Meiosis::dh_xo(f, xoparam)))
We can also compute the realized heterozygosity of an individual, i.e., the proportion of the genome that is heterozygous with respect to the founder alleles present.
## Calculate realized heterozygosity. Meiosis::realized_heter(p_xo)
Parts of the core functionality and documentation of Meiosis was inspired and adapted, respectively, from the package simcross of Karl Broman.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.