Introduction

The pbsim package includes a suite of functions for simulating plant breeding programs. The motivation behind this package was to develop a set of tools that would make it simple to simulate plent breeding programs in order to test various methods or procedures. The package is on the simpler side of population genetics simulation tools, however it is nonetheless useful for quick experimentation.

The first step of using the package is to load it using the library() function:

library(pbsim)

Making a Genome

The first step is to make a geneome object. This is essentially a list of class genome. The object contains information such as number and length ofchromosomes, marker positions, and genetic architecture.

Simulate a genome with three chromosomes, each with 505 SNPs.

n.mar  <- rep(505, 3)
len <- c(120, 130, 140)

genome <- sim_genome(len = len, n.mar = n.mar)

The output is a genome object. A generic summary function can be used to summarize the genome object.

summary(genome)

Genomes can also be simulated to

Define the Genetic Model

The genetic model of a quantitative trait can be specified using the sim_gen_model() function. The function is fairly flexible, allowing for custom QTL models including genome location, additive effects, and dominance effects. Alternatively, the location and effects of QTL can be drawn from random distributions.

For instance, you can customize the QTL model by providing a matrix of that information.

chromosome <- c(1, 1, 2, 2, 3, 3)
pos <- as.numeric(sapply(X = genome$len, FUN = runif, n = 2, min = 0))
a <- c(1, 0.25, 0.5, 0.25, 0.25, 0.5)
d <- 0

qtl_model <- cbind(chromosome, pos, a, d)

genome <- sim_gen_model(genome, qtl_model)

The returned value from sim_gen_model() is a genome object with information on the genetic model. You can used the same summary() function to view a summary of the genetic model

summary(genome)

The genetic model can be randomly generated by passing an empty matrix. Here we simulate $L = 15$ QTL.

We pass the add.dist = "geometric" argument to generate additive allelic effects of QTL from a geometric series. As suggested by @Lande1990, the additive effect of QTL $k$ in $k = 1, 2, ..., L$ is $a^k$, where $a = \frac{1 - L}{1 + L}$.

When generating QTL randomly, we have the option of drawing SNPs in the genome to become QTL or generating new SNPs to become QTL. This is determined by the de.novo argument.

A Note on the de.novo argument: If de.novo = FALSE, an additional argument (max.qtl) must also be passed. Often in simulations, one is interested in different genetic architectures (i.e. different numbers of QTL). When using SNPs already in the genome to simulate QTL, this will result in different numbers of markers leftover. For example, if you are testing $L = 50$ QTL and $L = 100$ QTL, drawing from SNPs already in the genome, with $m = 500$ SNPs total, you will have $m - L = 450$ markers leftover in the first case and $m - L = 400$ markers leftover in the second case. If your downstream simulation involves, say, QTL mapping or genomic prediction, this uneven number of markers could confound the results. To avoid this, the max.qtl arguement sets $max.qtl - L$ QTL to have zero additive or dominance effects, resulting in the same number of markers leftover.

# Simulate 15 QTL drawn from SNPs already in the genome
n_qtl <- 15

# Empty matrix
qtl_model <- matrix(nrow = n_qtl, ncol = 4)

# Set a total of max.qtl = 20 QTL in the genome
genome <- sim_gen_model(genome = genome, qtl.model = qtl_model, add.dist = "geometric",
                        de.novo = FALSE, max.qtl = 20)

summary(genome)


# Simulate 15 QTL that are newly generated in the genome
genome <- sim_gen_model(genome = genome, qtl.model = qtl_model, add.dist = "geometric",
                        de.novo = TRUE)

summary(genome)

Notice that the number of markers is reduced in the first case, but not in the second.

Alternatively, the argument add.dist = "normal" can be passed to draw additive effects from a normal distribution.

By default, dominance is assumed absent. However, dominance effects can be specified in the same way that additive effects were specified above. Additionally dominance effects can be generated from a normal distribution by passing the dom.dist = "normal" argument

n_qtl <- 15

# Empty matrix
qtl_model <- matrix(nrow = n_qtl, ncol = 4)

genome <- sim_gen_model(genome = genome, qtl.model = qtl_model, add.dist = "geometric",
                        dom.dist = "normal", de.novo = TRUE)

The genetic architecture of multiple traits can be simulated by

Generate a pedigree

ped <- sim_pedigree(n.ind = 30, n.bcgen = 0, n.selfgen = 2, n.par = 2)

Simulate a family

# First generate founders for a bi-parental population
founder_geno <- sim_founders(genome)

fam <- sim_family(genome, ped, founder_geno)

Simulate a population of many families

# First create a crossing block

References



neyhartj/pbsim documentation built on Nov. 11, 2023, 4:07 p.m.