Basics in Quantitative Genetics {#quan-gen}

s_this_rmd_file <- rmdhelp::get_this_rmd_file()
mrmt$set_this_rmd_file(ps_this_rmd_file = s_this_rmd_file)

As already mentioned in section \@ref(geno-pheno), the central dogma of molecular biology tells us that the genotype is the basics of any phenotypic expression. The genotype of an individual is composed of a number of genes which are also called loci. In this section, we start with the simplest possible genetic architecture where the genotype is composed by just one locus. The connection between the genotype and the phenotype is modeled according to equation \@ref(eq:phengenenv). The phenotype is assumed to be a quantitative trait. That means we are not looking at binary or categorical traits. Categorical traits can just take a limited number of different levels. Examples of categorical traits are the horn status in cattle or certain color characteristics. Quantitative traits do not take discrete levels but they show specific distributions.

Single Locus - Quantitative Trait {#single-locus-quant-trait}

In Livestock there are not many examples where a quantitative trait is influenced by just one locus. But this case helps in understanding the foundation of more complex genetic architectures. We start by looking at the following idealized population (Figure \@ref(fig:idealpopsingletrait)).

#rmddochelper::use_odg_graphic(ps_path = "odg/idealpopsingletrait.odg")
knitr::include_graphics(path = "odg/idealpopsingletrait.png")

Terminology {#qg-terminology}

The different genetic variants that are present at our Locus $G$ are called alleles. When looking at all individuals in the population for our locus, we have two different alleles $G_1$ and $G_2$. Hence, we call the locus $G$ to be a bi-allelic locus. In any given individual of the population, the two alleles of the locus $G$ together are called the individuals genotype. All possible combinations of the two alleles at the locus $G$ leads to a total number of three genotypes. It is important to mention that the order of the alleles in a given genotype is not important. Hence, $G_1G_2$ and $G_2G_1$ are the same genotype. The two genotypes $G_1G_1$ and $G_2G_2$ are called homozygous and the genotype $G_1G_2$ is called heterozygous.

Frequencies {#qg-frequency}

To be able to characterize our population with respect to the locus of interest, we are first looking at some frequencies. These are measures of how often a certain allele or genotype does occur in our population. For our example population shown in Figure \@ref(fig:idealpopsingletrait), the genotype frequencies are

\begin{align} f(G_1G_1) &= \frac{4}{10} = 0.4 \notag \ f(G_1G_2) &= \frac{3}{10} = 0.3 \notag \ f(G_2G_2) &= \frac{3}{10} = 0.3 (#eq:genotypefreq) \end{align}

The allele frequencies can be determined either by counting or they can be computed from the genotype frequencies.

\begin{align} f(G_1) &= f(G_1G_1) + {1\over 2}f(G_1G_2) = 0.55 \notag \ f(G_2) &= f(G_2G_2) + {1\over 2}f(G_1G_2) = 0.45 (#eq:allelefreq) \end{align}

Hardy-Weinberg Equilibrium {#hw-eq}

The Hardy-Weinberg equilibrium is the central law of how allele frequencies and genotype frequencies are related in an idealized population. Given the allele frequencies

\begin{align} f(G_1) &= p \notag \ f(G_2) &= q = 1-p (#eq:allelefreq) \end{align}

During mating, we assume that in an idealized population alleles are combined independently. This leads to the genotype frequencies shown in Table \@ref(tab:tabgenfreq).

df_genfreq <- data.frame(Alleles = c("$G_1$", "$G_2$"),
                         G1 = c("$f(G_1G_1) = p^2$", "$f(G_1G_2) = p*q$"),
                         G2 = c("$f(G_1G_2) = p*q$", "$f(G_2G_2) = q^2$")) 
names(df_genfreq) <- c("Alleles", "$G_1$", "$G_2$")
knitr::kable(df_genfreq,
             format   = ifelse(knitr::is_latex_output(), 'latex', 'html'),
             booktabs = TRUE,
             caption  = "Genotype Frequencies under Hardy-Weinberg equilibrium",
             align    = "c",
             escape   = FALSE
)

Summing up the heterozygous frequencies leads to

\begin{align} f(G_1G_1) &= p^2 \notag \ f(G_1G_2) &= 2pq \notag \ f(G_2G_2) &= q^2 (#eq:genofreq) \end{align}

Comparing these expected genotype frequencies in a idealized population under the Hardy-Weinberg equilibrium to what we found for the small example population in Figure \@ref(fig:idealpopsingletrait), we can clearly say that the small example population is not in Hardy-Weinberg equilibrium.

Value and Mean {#value-mean}

Our goal is still to improve our population at the genetic level. The term improvement implies the need for a quantitative assessment of our trait of interest. Furthermore, we have to be able to associate the genotypes in the population to the quantitative values of our trait.

Genotypic Values {#geno-value}

The values $V_{ij}$ to each genotype $G_iG_j$ are assigned as shown in Figure \@ref(fig:genotypicvalue).

#rmddochelper::use_odg_graphic(ps_path = "odg/genotypicvalue.odg")
knitr::include_graphics(path = "odg/genotypicvalue.png")

The origin of the genotypic values is placed in the middle between the two homozygous genotypes $G_2G_2$ and $G_1G_1$. Here we are assuming that $G_1$ is the favorable allele. This leads to values of $+a$ for genotype $G_1G_1$ and of $-a$ for genotype $G_2G_2$. The value of genotype $G_1G_2$ is set to $d$ and is called dominance deviation. Table \@ref(tab:tabsumgenvalue) summarizes the values for all genotypes.

knitr::kable(
  data.frame(Variable = c("$V_{11}$", "$V_{12}$", "$V_{22}$"),
             Genotype = c("$G_1G_1$", "$G_1G_2$", "$G_2G_2$"),
             Values   = c("a", "d", "-a")),
  format   = ifelse(knitr::is_latex_output(), 'latex', 'html'),
  booktabs = TRUE,
  caption = "Values for all Genotypes",
  align = "c",
  escape = FALSE
)

Population Mean {#pop-mean}

For the complete population, we can compute the population mean ($\mu$) of all values at the locus $G$. This mean corresponds to the expected value and is computed as

\begin{align} \mu &= V_{11} * f(G_1G_1) + V_{12} * f(G_1G_2) + V_{22} * f(G_2G_2) \notag \ &= a * p^2 + d *2pq + (-a) * q^2 \notag \ &= (p-q)a + 2pqd (#eq:popmean) \end{align}

The population mean depends on the values $a$ and $d$ and on the allele frequencies $p$ and $q$. The larger the difference between $p$ and $q$ the more influence the value $a$ has in $\mu$, because for very different $p$ and $q$ the product $2pq$ is very small. On the other hand, if $p=q=0.5$, then $\mu = 0.5d$. For loci with $d=0$, the population mean $\mu = (p-q)a$ and hence, if in addition we have $p=q$, then $\mu=0$.

Breeding Values {#breed-value}

The term breeding value is defined as shown in Definition \@ref(def:defbreedingvalue).

```{definition, name = "Breeding Value", label="defbreedingvalue"} The breeding value of an animal $i$ is defined as two times the difference between the mean value of offsprings of animal $i$ and the population mean.

Applying this definition and using the parameters that we have computed so far leads to the following formulas for the breeding value of an animal with a certain genotype. 


#### Breeding value for $G_1G_1$
Assume that we have a given parent $S$ with a genotype $G_1G_1$ and we want to compute its breeding value. Let us further suppose that our single parent $S$ is mated to a potentially infinite number of animals from the idealized population, then we can deduce the following mean genotypic value for the offspring of parent $S$. 

\vspace{5ex}

\begin{tabular}{|c|c|c|}
\hline
& \multicolumn{2}{|c|}{Mates of $S$} \\
\hline
& $f(G_1) = p$       &  $f(G_2) = q$   \\
\hline
Parent $S$       &                    &                 \\
\hline
$f(G_1) = 1$ &  $f(G_1G_1) = p$   &  $f(G_1G_2) = q$\\
\hline
\end{tabular}

\vspace{5ex}

Because parent $S$ has genotype $G_1G_1$, the frequency $f(G_1)$ of a $G_1$ allele coming from $S$ is $1$ and the frequency $f(G_2)$ of a $G_2$ allele is 0. The expected genetic value ($\mu_{11}$) of the offspring of animal $S$ can be computed as

\begin{equation}
\mu_{11} = p*a + q*d
(\#eq:MeanOffGen11)
\end{equation}

Applying definition \@ref(def:defbreedingvalue), we can compute the breeding value ($BV_{11}$) for animal $S$ as shown in equation \@ref(eq:BVGen11) while using the results given by equations \@ref(eq:MeanOffGen11) and \@ref(eq:popmean).

\begin{align}
BV_{11} &=  2*(\mu_{11} - \mu)  \notag \\
        &=  2\left(pa + qd - \left[(p - q)a + 2pqd \right] \right) \notag\\
        &=  2\left(pa + qd - (p - q)a - 2pqd \right) \notag\\
        &=  2\left(qd + qa - 2pqd\right) \notag \\
        &=  2\left(qa + qd(1 - 2p)\right) \notag \\
        &=  2q\left(a + d(1 - 2p)\right) \notag \\
        &=  2q\left(a + (q-p)d\right)
(\#eq:BVGen11)
\end{align}


Breeding values for parents with genotypes $G_2G_2$ and $G_1G_2$ are derived analogously.

#### Breeding value for $G_2G_2$
First, we determine the expected genotypic value for offsprings of a parent $S$ with genotype $G_2G_2$

\vspace{5ex}

\begin{tabular}{|c|c|c|}
\hline
& \multicolumn{2}{|c|}{Mates of parent $S$} \\
\hline
& $f(G_1) = p$       &  $f(G_2) = q$   \\
\hline
Parent $S$       &                    &                 \\
\hline
$f(G_2) = 1$ &  $f(G_1G_2) = p$   &  $f(G_2G_2) = q$\\
\hline
\end{tabular}

\vspace{5ex}

The expected genetic value ($\mu_{22}$) of the offspring of animal $S$ can be computed as

\begin{equation}
\mu_{22} = pd - qa
(\#eq:MeanOffGen22)
\end{equation}

The breeding value $BV_{22}$ corresponds to

\begin{align}
BV_{22} &=   2*(\mu_{22} - \mu)  \notag \\
        &=   2\left(pd - qa - \left[(p - q)a + 2pqd \right] \right) \notag \\
        &=   2\left(pd - qa - (p - q)a - 2pqd \right) \notag \\
        &=   2\left(pd - pa - 2pqd\right) \notag \\
        &=   2\left(-pa + p(1-2q)d\right) \notag \\
        &=  -2p\left(a + (q - p)d\right)
(\#eq:BVGen22)
\end{align}


#### Breeding value for $G_1G_2$
The genotype frequencies of the offsprings of a parent $S$ with a genotype $G_1G_2$ is determined in the following table.

\vspace{5ex}

\begin{tabular}{|c|c|c|}
\hline
& \multicolumn{2}{|c|}{Mates of parent $S$} \\
\hline
& $f(G_1) = p$       &  $f(G_2) = q$   \\
\hline
Parent $S$       &                    &                 \\
\hline
$f(G_1) = 0.5$ &  $f(G_1G_1) = 0.5p$   &  $f(G_1G_2) = 0.5q$\\
\hline
$f(G_2) = 0.5$ &  $f(G_1G_2) = 0.5p$   &  $f(G_2G_2) = 0.5q$\\
\hline
\end{tabular}


\vspace{5ex}

The expected mean genotypic value of the offsprings of parent $S$ with genotype $G_1G_2$ is computed as

\begin{equation}
\mu_{12} = 0.5pa + 0.5d - 0.5qa = 0.5\left[(p-q)a + d \right]
(\#eq:MeanOffGen12)
\end{equation}

The breeding value $BV_{12}$ corresponds to 

\begin{align}
ZW_{12} &=   2*(\mu_{12} - \mu) \notag \\
        &=   2\left(0.5(p-q)a + 0.5d - \left[(p - q)a + 2pqd \right] \right) \notag \\
        &=   2\left(0.5pa - 0.5qa + 0.5d - pa + qa - 2pqd \right) \notag \\
        &=   2\left(0.5(q-p)a + (0.5 - 2pq)d \right) \notag \\
        &=   (q-p)a + (1-4pq)d  \notag \\
        &=   (q-p)a + (p^2 + 2pq + q^2 -4pq)d  \notag \\
        &=   (q-p)a + (p^2 - 2pq + q^2)d  \notag \\
        &=   (q-p)a + (q - p)^2d   \notag \\
        &=   (q-p)\left[a + (q-p)d \right]
(\#eq:BVGen12)
\end{align}

#### Summary of Breeding Values
The term $a + (q-p)d$ appears in all three breeding values. We replace this term by $\alpha$ and summarize the results in the following table.


\begin{tabular}{|c|c|}
  \hline
  Genotype  &  Breeding Value\\
  \hline
  $G_1G_1$  &  $2q\alpha$    \\
  \hline
  $G_1G_2$  &  $(q-p)\alpha$ \\
  \hline
  $G_2G_2$  &  $-2p\alpha$   \\
  \hline
  \end{tabular}


### Allele Substitution {#allele-substitution}
Comparing the genotype $G_2G_2$ with the genotype $G_1G2$, one of the differences is in the number of $G_1$-alleles. $G_2G_2$ has zero $G_1$-alleles and $G_1G_2$ has one $G_1$-allele. They also have different breeding values. Because the breeding values are to be used to assess the value of a given genotype, any difference between the breeding values $BV_{12}$ and $B_{22}$ can be associated to that additional $G_1$ allele in $G_1G_2$ compared to $G_2G_2$. 

The computation of the difference between the breeding value $BV_{12}$ and $B_{22}$ results in

\begin{align}
    BV{12} - BV_{22} &=   (q-p)\alpha - \left( -2p\alpha \right)  \notag \\
                      &=   (q-p)\alpha + 2p\alpha \notag \\
                      &=   (q-p+2p)\alpha \notag \\
                      &=   (q+p)\alpha \notag \\
                      &=   \alpha
  (\#eq:AdditiveBv1)
\end{align}

The analogous computation can be done by comparing the breeding values $BV_{11}$ and $BV_{12}$.

\begin{align}
    BV_{11} - BV_{12} & = & 2q\alpha - (q-p)\alpha \notag \\
                      & = & \left(2q - (q-p)\right)\alpha \notag\\
                      & = & \alpha 
  (\#eq:AdditiveBv2)
\end{align}

The breeding values themselves depend on the allele frequencies. But the differences between breeding values are linear while replacing $G_2$ alleles by $G_1$ alleles. This replacement is also called allele-substitution and the term $alpha$ is called __allele-substitution effect__.


### Dominance Deviation
When looking at the difference between the genotypic value $V_{ij}$ and the breeding value $BV_{ij}$ for each of the three genotypes, we get the following results.

  \begin{align}
  V_{11} - BV_{11} &=   a - 2q \alpha \notag \\
                   &=   a - 2q \left[ a + (q-p)d \right] \notag \\
                   &=   a - 2qa -2q(q-p)d \notag \\
                   &=   a(1-2q) - 2q^2d + 2pqd \notag \\
                   &=   \left[(p - q)a + 2pqd\right] - 2q^2d \notag \\
                   &=   \mu + D_{11} 
  \end{align}

  \begin{align}
  V_{12} - BV{12} &=   d - (q-p)\alpha \notag \\
                   &=   d - (q-p)\left[ a + (q-p)d \right] \notag \\
                   &=   \left[(p-q)a + 2pqd\right] + 2pqd \notag \\
                   &=   \mu + D_{12}
  \end{align}

  \begin{align}
  V_{22} - BV_{22} &=   -a - (-2p\alpha) \notag \\
                   &=   -a + 2p\left[ a + (q-p)d \right] \notag \\
                   &=   \left[(p-q)a + 2pqd\right] - 2p^2d \notag \\
                   &=   \mu + D_{22} \notag
  \end{align}

The difference all contain the population mean $\mu$ plus a certain deviation. This deviation term is called __dominance deviation__.


### Summary of Values
The following table summarizes all genotypic values all breeding values and the dominance deviations. 


\begin{tabular}{|c|c|c|c|}
   \hline
   Genotyp  &  genotypic value     &  Breeding Value    &  Dominance Deviation \\
   $G_iG_j$ &  $V_{ij}$            &  $BV_{ij}$         &  $D_{ij}$           \\
   \hline
   $G_1G_1$ &  $a$                 &  $2q\alpha$        &  $-2q^2d$          \\
   \hline
   $G_1G_2$ &  $d$                 &  $(q-p)\alpha$     & $2pqd$             \\
   \hline
   $G_2G_2$ &  $-a$                &  $-2p\alpha$       & $-2p^2d$           \\
   \hline
\end{tabular}



The formulas in the above shown table assume that $G_1$ is the favorable allele with frequency $f(G_1) = p$. The allele frequency of $G_2$ is $f(G_2) = q$. Since we have a bi-allelic locus $p+q=1$.

Based on the definition of dominance deviation, the genotypic values $V_{ij}$ can be decomposed into the components population mean ($\mu$), breeding value ($BV_{ij}$) and dominance deviation ($D_{ij}$) according to equation \@ref(eq:SeparationGenoValue).

\begin{align}
V_{ij} &=   \mu + BV_{ij} + D_{ij}
(\#eq:SeparationGenoValue)
\end{align}


Taking expected values on both sides of equation \@ref(eq:SeparationGenoValue) and knowing that the population mean $\mu$ was defined as the expected value of the genotypic values in the population, i.e. $E\left[ V \right] = \mu$, it follows that the expected values of both the breeding values and the dominance deviations must be $0$. More formally, we have 

\begin{align}
E\left[ V \right] &=  E\left[ \mu + BV + D \right] \notag \\
                  &=  E\left[ \mu \right]  + E\left[ BV \right] + E\left[ D \right] \notag \\
                  &=  \mu
(\#eq:ExpValueGenBvDom)
\end{align}

From the last line in equation \@ref(eq:ExpValueGenBvDom), it follows that $E\left[ BV \right] = E\left[ D \right] = 0$. This also shows that both breeding values and dominance deviations are defined as deviation from a given mean.


## Variances {#variances}
The population mean $\mu$ and derived from that the breeding values were defined as expected values. Their main purpose is to assess the state of a given population with respect to a certain genetic locus and its effect on a phenotypic trait of interest. One of our primary goals in livestock breeding is to improve the populations at the genetic level through the means of selection and mating. Selection of potential parents that produce offspring that are closer to our breeding goals is only possible, if the selection candidates show a certain level of variation in the traits that we are interested in. In populations where there is no variation which means that all individuals are exactly at the same level, it is not possible to select potential parents for the next generation. 

In statistics the measure that is most often used to assess variation in a certain population is called __variance__. For any given discrete random variable $X$ the variance is defined as the second central moment of $X$ which is computed as shown in equation \@ref(eq:VarianceDiscreteRV).

\begin{equation}
Var\left[X\right] = \sum_{x_i \in \mathcal{X}} (x_i - \mu_X)^2 * f(x_i)
(\#eq:VarianceDiscreteRV)
\end{equation}

 \vspace*{1ex}
  \begin{tabular}{p{1cm}p{1cm}p{6cm}}
  where & $\mathcal{X}$: &  set of all possible $x$-values\\
        & $f(x_i)$       &  probability that $x$ assumes the value of $x_i$ \\
        & $\mu_X $       &  expected value $E\left[X\right]$ of $X$
  \end{tabular}


\vspace*{2ex}
In this section we will be focusing on separating the obtained variances into different components according to their causative sources. Applying the definition of variance given in equation  \@ref(eq:VarianceDiscreteRV) to the genotypic values $V_{ij}$, we obtain the following expression.

\begin{align}
\sigma_G^2 = Var\left[V\right] &=   (V_{11} - \mu)^2 * f(G_1G_1) \notag \\
                               &  +\  (V_{12} - \mu)^2 * f(G_1G_2) \notag \\
                               &  +\  (V_{22} - \mu)^2 * f(G_2G_2)
(\#eq:VarianceGenotypicValue)
\end{align}

where $\mu = (p - q)a + 2pqd$ the population mean.

Based on the decomposition of the genotypic value $V_{ij}$ given in \@ref(eq:SeparationGenoValue), the difference between $V_{ij}$ and $\mu$ can be written as the sum of the breeding value and the dominance deviation. Then $\sigma_G^2$ can be written as

\begin{align}
\sigma_G^2 = Var\left[V\right] &=   (BV_{11} + D_{11})^2 * f(G_1G_1) \notag \\
                               &  +\  (BV_{12} + D_{12})^2 * f(G_1G_2) \notag \\
                               &  +\  (BV_{22} + D_{22})^2 * f(G_2G_2)
(\#eq:GeneticVarianceBVDom)
\end{align}

Inserting the expressions for the breeding values $BV_{ij}$ and for the dominance deviation $D_{ij}$ found earlier and simplifying the equation leads to the result in \@ref(eq:FinalGeneticVariance). A more detailed derivation of $\sigma_G^2$ is given in the appendix (\@ref(appendix-derivations)) of this chapter.

\begin{align}
  \sigma_G^2 &=  2pq\alpha^2 + \left(2pqd \right)^2 \notag\\
             &=  \sigma_A^2 + \sigma_D^2
(\#eq:FinalGeneticVariance)             
\end{align}

The formula in equation \@ref(eq:FinalGeneticVariance) shows that $\sigma_G^2$ consists of two components. The first component $\sigma_A^2$ is called the __genetic additive variance__ and the second component $\sigma_D^2$ is termed __dominance variance__. As shown in equation \@ref(eq:VarBV) $\sigma_A^2$ corresponds to the variance of the breeding values. Because we have already seen that the breeding values are additive in the number of favorable alleles, $\sigma_A^2$ is called genetic additive variance. Because $\sigma_D^2$ corresponds to the variance of the dominance deviation effects (see equation \@ref(eq:VarDom)) it is called dominance variance.


## Extension To More Loci {#extension-to-more-loci}
When only a single locus is considered, the genotypic values ($V_{ij}$) can be decomposed according to equation \@ref(eq:SeparationGenoValue) into population mean, breeding value and dominance deviation. When a genotype refers to more than one locus, the genotypic value may contain an additional deviation caused by non-additive combination effects. 


### Epistatic Interaction {#epistatic-interaction}
Let $V_A$ be the genotypic value of locus $A$ and $V_B$ denote the genotypic value of a second locus $B$, then the total aggregate genotypic value $V$ attributed to both loci $A$ and $B$ can be written as 

\begin{equation}
V = V_A + V_B + I_{AB}
(\#eq:AggregateGenotypicValueTwoLoci)
\end{equation}

where $I_{AB}$ is the deviation from additive combination of these genotypic values. When computing the population mean earlier in this chapter, we assumed that $I$ was zero for all combinations of genotypes. If $I$ is not zero for any combination of genes at different loci, those genes are said to __interact__ with each other or to exhibit __epistasis__. The deviation $I$ is called interaction deviation or epistatic deviation. If $I$ is zero, the genes are called to act additively between loci. Hence _additive action_ may mean different things. When referring to one locus, it means absence of dominance. When referring to different loci, it means absence of epistasis.

Interaction between loci may occur between pairs or between higher numbers of different loci. The complex nature of higher order interactions, i.e., interactions between higher number of loci does not need to concern us. Because in the aggregate genotypic value $V$, interaction deviations of all sorts are treated together in an overall interaction deviation $I$. This leads to the following generalized form of decomposing the overall aggregate genotype $V$ for the case of multiple loci affecting a certain trait of interest.

\begin{equation}
V = \mu + U + D + I
(\#eq:AggregateGenotypicValueMultipleLoci)
\end{equation}

where $U$ is the sum of the breeding values attributable to the separate loci and $D$ is the sum of all dominance deviations. For our purposes in livestock breeding where we want to assess the genetic potential of a selection candidate to be a parent of offspring forming the next generation, the __breeding value__ is the most important quantity. The breeding value is of primary importance because a given parent passes a random sample of its alleles to its offspring. We have seen in section \@ref(allele-substitution) that breeding values are additive in the number of favorable alleles. Hence the more favorable alleles a given parent passes to its offspring the higher the breeding value of this parent. 

On the other hand, the dominance deviation measures the effect of a certain genotype occurring in an individual and the interaction deviation estimates the effects of combining different genotypes at different loci in the genome. But because parents do not pass complete genotypes nor do they pass stretches of DNA with several loci, but only a random collection of its alleles, it is really the breeding value that is of primary importance in assessing the genetic potential of a given selection candidate. 


### Interaction Variance {#interaction-variance}
If genotypes at different loci show epistatic interaction effects as described in section \@ref(epistatic-interaction), the interactions give rise to an additional variance component called $V_I$, which is the variance of interaction deviations. This new variance component $V_I$ can be further decomposed into sub-components. The first level of sub-components is according to the number of loci that are considered. Two-way interactions involve two loci, three-way interactions consider three loci and in general $n$-way interactions arise from $n$ different loci. The next level of subdivision is according to whether they include additive effects, dominance deviations or both. 

In general it can be said that for practical purposes, interaction effects explain only a very small amount of the overall variation. As already mentioned in section \@ref(#epistatic-interaction) for livestock breeding, we are mostly interested in the additive effects. This is also true when looking at the variance components. Hence dominance variance and variances of interaction deviations are not used very often in practical livestock breeding application. 


## Genetic Models {#genetic-models}
In this chapter, we have seen how to model the genetic basis of a quantitative trait when a single locus affects the trait of interest. We call this a single-locus model. When several loci have an effect on a certain trait, then we talk about a __polygenic model__. Letting the number of loci affecting a certain phenotype tend to infinity, the resulting model is called __infinitesimal model__. 

```r
# fixing some constants
set.seed(9876)
n_nr_sample <- 10000
n_nr_comp <- 10^(1:3)

# define a function to compute sample means
sample_comp_sum <- function(pn_nr_comp, pn_nr_sample){
  return(sapply(1:pn_nr_comp, 
                function(x, y) sum(runif(y))/y,
                pn_nr_sample))
}
# vectors with component sums
vec_sum_10 <- sample_comp_sum(pn_nr_comp = n_nr_comp[1], pn_nr_sample = n_nr_sample)
vec_sum_100 <- sample_comp_sum(pn_nr_comp = n_nr_comp[2], pn_nr_sample = n_nr_sample)
vec_sum_1000 <- sample_comp_sum(pn_nr_comp = n_nr_comp[3], pn_nr_sample = n_nr_sample)

From a statistical point of view, the breeding values in an infinitesimal model are considered as a random effect with a known distribution. Due to the central limit theorem, this distribution is assumed to be a normal distribution. The central limit theorem says that the distribution of any sum of a large number of very small effects converges to a normal distribution. For our case where a given trait of interest is thought to be influenced by a large number of genetic loci each having a small effect, the sum of the breeding values of all loci together can be approximated by a normal distribution. Figure (\@ref(fig:hist-clt)) shows the distribution for a sum of r n_nr_comp[1], r n_nr_comp[2] and r n_nr_comp[3] components each. The histograms show a better approximation to the normal distribution the larger the number of components considered in the sum.

# plots
opar <- par()
par(mfrow=c(1,3))
hist(vec_sum_10)
hist(vec_sum_100)
hist(vec_sum_1000)

Model Usage In Routine Evaluations

Traditional prediction of breeding values before the introduction of genomic selection is based on the infinitesimal model. When genomic selection was introduced which takes into account the information from a large number of loci, genomic breeding values are estimated using a polygenic model.

Appendix: Derivations {#appendix-derivations}

This section shows how the genetic variance in equation \@ref(eq:FinalGeneticVariance) is computed.

\begin{align} \sigma_G^2 &= (BV_{11} + D_{11})^2 * p^2 \notag \ & +\ (BV_{12} + D_{12})^2 * 2pq \notag \ & +\ (BV_{22} + D_{22})^2 * q^2 \notag \ &= \left(2q\alpha - 2q^2d \right)^2 * p^2 \notag \ & +\ \left((q-p)\alpha + 2pqd \right)^2 * 2pq \notag \ & +\ \left(-2p\alpha - 2p^2d \right)^2 * q^2 \notag \ &= \left(4q^2\alpha^2 - 8q^3d\alpha + 4q^4d^2 \right) * p^2 \notag \ & +\ \left(q^2\alpha^2 - 2pq\alpha^2 + p^2\alpha^2 - 4(q-p)pqd\alpha + 4p^2q^2d^2\right) * 2pq \notag \ & +\ \left(4p^2\alpha^2 + 8p^3d\alpha + 4p^4\alpha^2 \right) * q^2 \notag \ &= 4p^2q^2\alpha^2 - 8p^2q^3d\alpha + 4p^2q^4d^2 \notag \ & +\ 2pq^3\alpha^2 - 4p^2q^2\alpha^2+ 2p^3q\alpha^2 \notag \ & -\ 8p^3q^2d\alpha + 8p^2q^3d\alpha + 8p^3q^3d^2 \notag \ & +\ 4p^2q^2\alpha^2 + 8p^3q^2d\alpha + 4p^4q^2d^2 \notag \ &= 4p^2q^2\alpha^2 + 4p^2q^4d^2 \notag \ & +\ 2pq^3\alpha^2 + 2p^3q\alpha^2 \notag \ & +\ 8p^3q^3d^2 \notag \ & +\ 4p^4q^2d^2 \notag \ &= 2pq\alpha^2 \left(p^2 + 2pq + q^2 \right) \notag \ & +\ \left(2pqd \right)^2 \left(p^2 + 2pq + q^2 \right) \notag \ &= 2pq\alpha^2 + \left(2pqd \right)^2 \notag \ &= \sigma_A^2 + \sigma_D^2 (#eq:GenVarZWD) \end{align}

From the last two lines of \@ref(eq:GenVarZWD) it follows that $\sigma_A^2 = 2pq\alpha^2$ and $\sigma_D^2 = \left(2pqd \right)^2$. It can be shown that $\sigma_A^2$ corresponds to the squared breeding values times the associated genotype frequencies. Because the expected values of the breeding values is zero, $\sigma_A^2$ is equivalent to the variance of the breeding values.

\begin{align} \sigma_A^2 &= Var\left[ BV \right] = (BV_{11} - E\left[ BV \right])^2 * f(G_1G_1) \notag \ & + (BV_{12} - E\left[ BV \right])^2 * f(G_1G_2) \notag \ & + (BV_{22} - E\left[ BV \right])^2 * f(G_2G_2) \notag \ &= BV_{11}^2 * f(G_1G_1) + BV_{12}^2 * f(G_1G_2) + BV_{22}^2 * f(G_2G_2) \notag \ &= \left(2q \alpha \right)^2 * p^2 + \left((q-p) \alpha \right)^2 * 2pq + \left(-2p \alpha \right)^2 * q^2 \notag \ &= 4p^2 q^2 \alpha^2 + \left(q^2 \alpha^2 -2pq\alpha^2 + p^2 \alpha^2 \right) * 2pq + 4p^2q^2\alpha^2 \notag \ &= 8p^2 q^2 \alpha^2 + 2pq^3\alpha^2 -4p^2q^2\alpha^2 + 2p^3q\alpha^2 \notag \ &= 4p^2 q^2 \alpha^2 + 2pq^3\alpha^2 + 2p^3q\alpha^2 \notag \ &= 2pq\alpha^2 \left(2pq + q^2 + p^2 \right) \notag \ &= 2pq\alpha^2 (#eq:VarBV) \end{align}

In the above derivation in \@ref(eq:VarBV) of the variance of the breeding values, we were using the fact that the expected value $E\left[ BV \right]=0$. This can be shown more formally as follows

\begin{align} E\left[ BV \right] &= BV_{11} * f(G_1G_1) + BV_{12} * f(G_1G_2) + BV_{22} * f(G_2G_2) \notag \ &= 2q \alpha * p^2 + (q-p) \alpha * 2pq + (-2p \alpha) * q^2 \notag \ &= 2p^2q \alpha + 2pq^2 \alpha - 2p^2q\alpha - 2pq^2\alpha \notag \ &= 0 (#eq:ExpectedValueBV) \end{align}

Similarly to \@ref(eq:VarBV) we can show that $\sigma_D^2$ corresponds to the squared dominance deviations times the frequencies of the corresponding genotypes. That is the reason why $\sigma_D^2$ is called dominance variance.

\begin{align} \sigma_D^2 &= D_{11}^2 * f(G_1G_1) + D_{12}^2 * f(G_1G_2) + D_{22}^2 * f(G_2G_2) \notag \ &= (- 2q^2d)^2 * p^2 + (2pqd)^2 * 2pq + (- 2p^2d)^2 * q^2 \notag \ &= 4p^2q^4d^2 + 8p^3q^3d^2 + 4p^4q^2d^2 \notag \ &= 4p^2q^2d^2 \left( q^2 + 2pq + p^2 \right) \notag \ &= 4p^2q^2d^2 (#eq:VarDom) \end{align}



charlotte-ngs/lbgfs2020 documentation built on Dec. 20, 2020, 5:39 p.m.