Multiple Traits {#multipletraits}

s_this_rmd_file <- rmdhelp::get_this_rmd_file()
mrmt$set_this_rmd_file(ps_this_rmd_file = s_this_rmd_file)

So far we have seen how to predict breeding values using the BLUP animal model. These breeding values were always only for one trait. From a statistical point of view, experts are calling such analyses univariate evaluations. In reality, livestock breeders want to improve their animals in a population with respect to several traits simultaneously. If there are genetic relationships (measured by genetic correlations) between traits, univariate predictions of breeding values do not make optimal use of the available information. This effect is stronger, if certain traits can only be observed in animals of one sex. According to r mrmt$add('Mrode2005') who cites r mrmt$add('Schaeffer1984') and r mrmt$add('TM1986'), the increased accuracy of the evaluations is one of the main advantage of multivariate BLUP analyses. Predictions of breeding values for several traits in a single evaluation is called multivariate prediction of breeding values. Such multivariate analyses can be implemented in different ways such as

Before the introduction of the BLUP animal model, breeding values were estimated using a method that is called selection index method. A brief review about selection index theory is given in section \@ref(review-selection-index-theory). While selection indices are no longer used to estimate breeding values, selection index theory is still used to predict the aggregate genotype, as will be shown later in this chapter. Before that, we start with an introduction to multivariate BLUP methods.

Multivariate Predictions Of Breeding Values Using BLUP

The prediction of breeding values using a multivariate BLUP model is the optimal prediction procedure. It has advantages, if

In principle, a multivariate analysis can be thought of as several univariate analyses which are stacked one on top of the other. Let us assume that we have two traits. For the first trait, we define the model

$$y_1 = X_1 \beta_1 + Z_1u_1 + e_1$$

Similarly for the second trait, we define the model

$$y_2 = X_2 \beta_2 + Z_2u_2 + e_2$$

If we group the data by traits, then we can write the multivariate model as

$$\left[ \begin{array}{c} y_1 \ y_2 \end{array} \right] = \left[ \begin{array}{lr} X_1 & 0 \ 0 & X_2 \end{array} \right] \left[ \begin{array}{c} \beta_1 \ \beta_2 \end{array} \right] + \left[ \begin{array}{lr} Z_1 & 0 \ 0 & Z_2 \end{array} \right] \left[ \begin{array}{c} u_1 \ u_2 \end{array} \right] + \left[ \begin{array}{c} e_1 \ e_2 \end{array} \right] $$

The genetic variance-covariance matrix $G_0$ for the two traits has the following structure.

$$G_0 = \left[ \begin{array}{lr} \sigma_{g_{1}}^2 & \sigma_{g1,g2} \ \sigma_{g1,g2} & \sigma_{g_{2}}^2 \end{array} \right] = \left[ \begin{array}{lr} g_{11} & g_{12} \ g_{21} & g_{22} \end{array} \right] $$

The inverse $G_0^{-1}$ of $G_0$ can be formulated as

$$G_0^{-1} = \left[ \begin{array}{lr} g^{11} & g^{12} \ g^{21} & g^{22} \end{array} \right] $$

For the random residual effects, the variance-covariance matrix $R_0$ for the two traits can be written as

$$R_0 = \left[ \begin{array}{lr} r_{11} & r_{12} \ r_{21} & r_{22} \end{array} \right]$$

Also the inverse $R_0^{-1}$ can be written as

$$R_0^{-1} = \left[ \begin{array}{lr} r^{11} & r^{12} \ r^{21} & r^{22} \end{array} \right]$$

The variance-covariance matrix for the complete set of true breeding values and for all random residuals can be written as

$$var(u) = var\left[\begin{array}{c} u_1 \ u_2 \end{array} \right]

\left[ \begin{array}{lr} g_{11}A & g_{12}A \ g_{21}A & g_{22}A \end{array} \right] = G_0 \otimes A = G $$

where $A$ is the numerator relationship matrix and $\otimes$ denotes the so-called Kroneckerproduct of two matrices. The variance-covariance matrix $R$ for the random residuals is given by

$$ R = var(e) = var\left[\begin{array}{c} e_1 \ e_2 \end{array} \right] =
\left[ \begin{array}{lr} r_{11}I_n & r_{12}I_n \ r_{21}I_n & r_{22}I_n \end{array} \right] = R_0 \otimes I_n $$

where $n$ corresponds to the number of animals in the pedigree. The covariances between all elements of $a$ and $e$ are $0$. This is denoted by combining both random vectors into a single vector and writing down the variance-covariance matrix of the combined vector as

$$ var\left[\begin{array}{c} u \ e \end{array} \right] =
\left[ \begin{array}{lr} G & 0 \ 0 & R \end{array} \right] = var\left[\begin{array}{c} u_1 \ u_2 \ e_1 \ e_2 \end{array} \right] =
\left[ \begin{array}{lccr} g_{11}A & g_{12}A & 0 & 0 \ g_{21}A & g_{22}A & 0 & 0 \ 0 & 0 & r_{11}I_n & r_{12}I_n \ 0 & 0 & r_{21}I_n & r_{22}I_n \end{array} \right] $$

The solutions to get estimates of fixed effects and to get predictions for breeding values are obtained from the solutions of mixed model equations. These have exactly the same structure as in the univariate case.

$$ \left[ \begin{array}{lr} X^TR^{-1}X & X^TR^{-1}Z \ Z^TR^{-1}X & Z^TR^{-1}Z + G^{-1} \end{array} \right] \left[ \begin{array}{c} \hat{\beta} \ \hat{u} \end{array} \right] = \left[ \begin{array}{c} X^TR^{-1}y \ Z^TR^{-1}y \end{array} \right] $$

where

$$ y = \left[ \begin{array}{c} y_1 \ y_2 \end{array} \right] \text{, } X = \left[ \begin{array}{lr} X_1 & 0 \ 0 & X_2 \end{array} \right] \text{, } \hat{\beta} = \left[ \begin{array}{c} \hat{\beta}_1 \ \hat{\beta}_2 \end{array} \right] \text{, } Z = \left[ \begin{array}{lr} Z_1 & 0 \ 0 & Z_2 \end{array} \right] \text{, } \hat{u} = \left[ \begin{array}{c} \hat{u}_1 \ \hat{u}_2 \end{array} \right] $$

Based on the specification of the variance-covariance matrices described earlier, we get $R^{-1} = R_0^{-1} \otimes I_n$ and $G^{-1} = G_0^{-1} \otimes A^{-1}$.

Multitrait Selection {#multitrait-selection}

Now that we have predicted breeding values for a collection of traits available, the question is how to integrate these predicted breeding values into a consistent selection criterion. Selection index theory provides a tool to optimally combine different sources of information in order to approximate the aggregate genotype $H$. In section \@ref(selindex-approx-agg-genotype), we will return to this topic once again. Although the selection index provides an ideal framework to combine estimated breeding values into an optimal selection criterion, we are going to describe to other selection procedures that are commonly used in practical livestock breeding. The two procedures are

  1. Tandem-Selection and
  2. Selection based on independent selection thresholds.

The description of these procedures aims at showing the negative consequences that results from the use of these sub-optimal selection strategies.

Tandem-Selection

The term Tandem-Selection stands for the strategy of always improving the population with respect to just one trait. Once the breeding goal for this trait is reached the population is improved with respect to a different trait. This sequence of single-trait improvements is continued until the breeding goal for all traits is reached.

The problem with Tandem-Selection is that while improving the population for a given trait, the population can only realize correlated selection responses for all other traits. These correlated selection responses might be very small or can even be negative which causes the time that it takes to reach the breeding goal for all relevant traits to be very long.

Selection Based On Independent Selection Thresholds

This method was used before the selection index was discovered. This selection procedure is very easy to apply. First selection thresholds are defined for all traits. In the next steps, all animals which are above the thresholds for all traits are selected as parents of the next generation. With this method, selection responses for all traits can be obtained in the early generations after the implementation of this selection strategy.

We are using the following example to show how selection based on independent selection thresholds is used. For reasons of simplicity, we restrict ourselves on two traits. But the results can be generalized without any problems.

Example: Selection On Independent Thresholds

### # selection thresolods
n_milk_sel_thr <- 6900
n_prot_sel_thr <- 3.5
### # mean and sd of traits
n_nr_trait <- 2
n_milk_mean <- 6800
n_milk_sd <- 600
n_prot_mean <- 3.4
n_prot_sd <- .2
n_corr <- -.4
### # variance - covariance matrix
n_cov <- n_corr * n_milk_sd * n_prot_sd
mat_varcov <- matrix(c(n_milk_sd^2, n_cov,
                       n_cov,       n_prot_sd^2), 
                     nrow = n_nr_trait, 
                     ncol = n_nr_trait, 
                     byrow = TRUE)

### # generate observations
n_nr_obs <- 50
### # cholesky decomposition of varcov
mat_varcov_chol <- chol(mat_varcov)
### # generate independent observations
set.seed(5432)
mat_unrel_obs <- matrix(c(rnorm(n_nr_obs),rnorm(n_nr_obs)), nrow = n_nr_trait, byrow = TRUE)
mat_obs <- crossprod(mat_unrel_obs, mat_varcov_chol) + matrix(c(n_milk_mean, n_prot_mean), nrow=n_nr_obs, ncol=n_nr_trait, byrow = TRUE)
### # some checks
# head(mat_obs)
# var(mat_obs)
# sd(mat_obs[,1])
# sd(mat_obs[,2])
# cor(mat_obs)

In a dairy cattle population, breeders want to improve milk yield and protein content. We assume the following selection thresholds for the two traits of interest during first lactation

### # convert data matrix into a tibble that is later used for plotting
tbl_milk_prot <- tibble::as_tibble(mat_obs)
colnames(tbl_milk_prot) <- c("Milk", "Protein")
### # define the colours of the threshold lines
s_col_milk_thr <- "red"
s_col_prot_thr <- "blue"
### # use ggplot2 to do the plot
library(ggplot2)
milk_prot_plot <- qplot(Milk, Protein, data=tbl_milk_prot, geom="point", 
                         xlab = "Milk Yield", ylab = "Protein Content")
milk_prot_plot <- milk_prot_plot + 
  geom_hline(yintercept = n_prot_sel_thr, colour = s_col_prot_thr) +
  geom_vline(xintercept = n_milk_sel_thr, colour = s_col_milk_thr)
print(milk_prot_plot)

Figure \@ref(fig:milkproteinperformanceplot) shows the performance data for a herd of dairy cows. The selection thresholds (r s_col_milk_thr line for milk yield and r s_col_prot_thr line for protein content) divide the diagram into four quadrants. None of the cows in the lower right quadrant does meet any of the selection criterion imposed by the thresholds. The cows in the upper left quadrant fulfill the requirements for protein content and the cows in the lower right quadrant fulfill the requirements for milk yield. Only the cows in the upper right quadrant fulfill the requirements for both traits.

The disadvantage of this selection strategy becomes apparent with the cows in the upper left and in the lower right quadrant. They are culled and thereby not considered as parents of the next generation even though, they have good performances in one of the traits. From a statistical genetics point of view there are three problems associated with a selection strategy that is based on independent selection thresholds

  1. livestock breeders tend to put the thresholds for all traits in the range of positive predicted breeding values. This leads to an exclusion of very many animals and a dramatic reduction in genetic variability
  2. genetic relationships between traits are completely ignored. These relationships must be considered when defining selection thresholds. Otherwise the expected genetic gain will not be as expected.
  3. differences in the economic relevance of the different traits are completely ignored. Putting the threshold in all traits into the range of positive predicted breeding values leads to a high emphasis on traits with a high heritability. Traits with lower heritability will have only very small selection responses.

Selection Index {#selindex-approx-agg-genotype}

In section \@ref(multitrait-selection), we have already briefly described how we can use selection index theory to approximate the aggregate genotype $H$ in an optimal way. Just as a reminder, the aggregate genotype $H$ combines all economically relevant traits into a single value using a linear function of the true breeding values $u$ and taking the economic values $w$ as weighting factors. Given that $H$ corresponds to the linear function

$$H = w^Tu$$

and we want to approximate $H$ by an Index $I$ which is a linear function of all predicted breeding values $\hat{u}$, we can write

$$I = b^T\hat{u}$$

where $b$ is a vector of unknown index weights. The vector $b$ is determined using the optimality condition of minimum prediction error variance which results in

\begin{equation} b = P^{-1}Gw (#eq:vectorindexweight) \end{equation}

where $P$ is the variance-covariance matrix between all information sources and $G$ is the covariance matrix between the information sources and the traits in the aggregate genotype. In case where the traits in the aggregate genotype $H$ and in the index $I$ are the same, the matrices $P$ and $G$ are defined as

$$P = var(\hat{u})$$

and

$$G = cov(u, \hat{u})$$

For predicted breeding values using BLUP, it can be shown that $cov(u, \hat{u}) = var(\hat{u})$ and therefore $P=G$. Using that equality in equation \@ref(eq:vectorindexweight), we get

$$b = w$$ which means that the vector $b$ of index weights corresponds to the vector of economic values $w$.

The use of the selection index theory to combine predicted breeding values as information sources to approximate the aggregate genotype has the following advantages

Despite all these advantages, index selection alone is very rarely used in practical livestock breeding. The reason for this is that every population has a few traits that are difficult to associate with an economic value or for some traits it is difficult to come up with genetic parameters. As a consequence of that, in practical livestock breeding we will always find a mix of index selection and a variety of independent selection thresholds.

\pagebreak

Review On Selection Index Theory {#review-selection-index-theory}

Before the introduction of the BLUP animal model (r mrmt$add('Henderson1973') and r mrmt$add('Henderson1975')), breeding values were estimated using Selection Index Theory (r mrmt$add('Hazel1943') and r mrmt$add('HL1942')). Both methods - selection index and BLUP - are based on the same genetic model. The main difference between the two methods consists in the way how they correct for identifiable systematic environmental effects. We start with a treatment of selection index theory.

Introduction {#sel-index-intro}

In principle, prediction of breeding values aims at assessing the genetic potential of a selection candidate that is due to additive gene effects based on all available information, such that the correlation between true and predicted breeding value is maximal. Because, we want to do this for a large number of selection candidates, we can formulate our aim in a more general way. For a given population, we want to predict breeding values for all animals in the population using all available information, such that the correlation between true and predicted breeding values are maximized. An alternative objective for the prediction to the maximization of the correlation between true and predicted breeding values is the minimization of the mean squared error of the prediction. The description of the aims of our procedure to predict breeding values shows that we are dealing with two different concepts of breeding value.

  1. True breeding value which corresponds to the sum of all additive gene-effects
  2. Predicted breeding value which is a function of the phenotypic observations ($y$) that is determined by statistical methods. As a prediction it is always associated with a certain error which we want to be minimal.

The prediction of breeding values has three different objectives.

  1. Selection candidates are ranked according to the predicted breeding values. Hence, it provides a criterion for selecting parents out of a pool of selection candidates
  2. Predicted breeding values are used to assess the response to selection and is important for planning a breeding program
  3. Predicted breeding values are one criterion that affect the price of breeding animals and the price of seamen.

The definition \@ref(def:defbreedingvalue) of the term breeding value has several problems when it comes to its potential usefulness for prediction.

To address these issues, the above mentioned methods were developed. We start with the method of the selection index.

Selection Index Method {#sel-ind-method}

The selection index is a method to predict the breeding value of an animal ($i$) by using all available information on the animal and on its relatives. The result of the selection index method is an assignment of a numerical value ($I$) to each animal. All animals in the population can then be ranked according to their index value. The ranking according to the index value can be used as selection criterion. In principle the index $I$ is defined as linear combination of all available information. This can be written as

\begin{equation} I = \hat{u_i} = b_1 y_1 + b_2 y_2 + \cdots + b_n y_n = b^Ty (#eq:selindexdef) \end{equation}

where $b$ is a vector of index weights and $y$ is a vector of information sources. Here we assume that all values in $y$ are corrected for appropriate mean levels. The resulting index value $I$ in \@ref(eq:selindexdef) is used as the predicted breeding value $\hat{u_i}$. From a statistical point of view equation \@ref(eq:selindexdef) corresponds to a multiple linear regression. The vector of index weights $b$ are understood as partial regression coefficients.

Aggregate Genotype {#aggregate-genotype}

In most practical livestock breeding scenarios, we want to improve a population at the genetic level with respect to more than one trait or characteristic, simultaneously. This requires a procedure that enables us to combine the breeding values of several trait into one selection criterion. This criterion is called the aggregate genotype $H$. It is defined as

\begin{equation} H = w_1 u_1 + w_2 u_2 + \cdots + w_m u_m = w^T u (#eq:defaggregategenotype) \end{equation}

where $u$ corresponds to the vector of true breeding values and $w$ is a vector of economic values. The economic value $w_k$ for a given trait $k$ is defined as the marginal change in profit caused by a small change in the population mean ($\mu_k$) of the trait $k$. At this point, we are not describing how the economic values $w_k$ are derived, but we consider them to be known. For the construction of the selection index, we are using the general form of the aggregate genotype $H$. Once the selection index is constructed, we can go back to the simple scenario of considering just one trait which reduces the aggregate genotype $H$ to the true breeding value $u$ of the single trait.

Theory of Index Construction {#theory-index-construction}

The term index construction stands for the computation of the vector of index weights $b$ for a given set of information sources and a given aggregate genotype. Independently from the available information sources, the following parameters must be known

The objective of the index construction is to maximize the correlation $r_{HI}$ between the index $I$ and the aggregate genotype $H$. Because the index $I$ corresponds to a multiple linear regression, the mean squared error between aggregate genotype and index is to be minimized. From this it follows that

\begin{equation} E(H-I)^2 \rightarrow \text{ min} (#eq:indexconstructionobjective) \end{equation}

The solution to the index construction objective in equation \@ref(eq:indexconstructionobjective) leads to the so-called index normal equations which have the following form.

\begin{equation} Pb = Gw (#eq:indexnormalequation) \end{equation}

where $P$ is the variance-covariance matrix between all information sources in the index, $G$ is the genetic variance-covariance matrix between the traits in the aggregate genotype and in the index and $w$ is a vector of known economic values. Solving for the vector of unknown index weights $b$ leads to

\begin{equation} b = P^{-1}Gw (#eq:indexweight) \end{equation}

The accuracy of the index is assessed by the correlation $r_{HI}$ between the index $I$ and the aggregate genotype $H$. The higher this correlation, the better the approximation of $H$ by $I$. The correlation $r_{HI}$ can be computed as shown in \@ref(eq:indexaccuracy). The terms for $cov(H,I)$, $\sigma_H$ and $\sigma_I$ are taken from \@ref(eq:appvarHcovHIvarI) and for $b$ we insert the solution taken from \@ref(eq:indexweight).

\begin{align} r_{HI} &= \frac{cov(H,I)}{\sigma_H \sigma_I} \notag \ &= \frac{w^T * G^T * b}{\sqrt{(w^T * C * w) * (b^T * P * b)}} \notag \ &= \frac{w^T * G^T * P^{-1}Gw}{\sqrt{(w^T * C * w) * ((P^{-1}Gw)^T * P * P^{-1}Gw)}} \notag \ &= \frac{w^T * G^T * P^{-1}Gw}{\sqrt{(w^T * C * w) * (w^T * G^T * P^{-1} * P * P^{-1}Gw)}} \notag \ &= \frac{w^T * G^T * P^{-1}Gw}{\sqrt{(w^T * C * w) * (w^T * G^T * P^{-1}Gw)}} \notag \ &= \sqrt{\frac{w^T * G^T * P^{-1}Gw}{w^T * C * w}} \notag \ &= \frac{\sigma_I}{\sigma_H} (#eq:indexaccuracy) \end{align}

The response to selection $R$ which results from applying a selection scheme according to the index $I$ per generation is computed as

\begin{align} R &= i * r_{HI} * \sigma_H \notag \ &= i * \frac{\sigma_I}{\sigma_H}* \sigma_H \notag \ &= i * \sigma_I \end{align}

where $i$ is the selection intensity.

Example of Index with Own Performance

The simplest case of an index $I$ is the one where the aggregate genotype $H$ consists of one trait and the index $I$ contains a single own performance record of the same trait. This is equivalent to using the index $I$ to predicting the breeding value $u$ of an animal based on own phenotypic own performance record $y$. Hence we can set

$$H = u \qquad \text{ and } \qquad I = by^*$$

During the index construction, we have assumed the information in the index to be corrected for the appropriate population mean $\mu$. For our example here, we can set $y^* = y-\mu$. To determine the unknown index weight $b$ which is on our example just a single number, we have to specify $P$, $G$ and $w$. Because, we are looking at just one trait, the vector of economic values $w$ is set to one. The matrix $P$ was defined to be the variance-covariance matrix between the traits in the index. As the index $I$ contains just one phenotypic record, then $P$ corresponds to the phenotypic variance $\sigma_y^2$ of our trait of interest. The matrix $G$ was defined to be the genetic variance-covariance matrix between the traits in the aggregate genotype and the traits in the index. In our example we have just one trait which is the same in $H$ and in $I$, hence $G$ corresponds to the additive genetic variance $\sigma_u^2$. In summary, we have found that

\begin{align} P &= \sigma_y^2 \notag \ G &= \sigma_u^2 \notag \ w &= 1 (#eq:inputindexownperformance) \end{align}

Inserting the terms of \@ref(eq:inputindexownperformance) into equation \@ref(eq:indexweight) to compute the index weight $b$ results in

\begin{align} b &= P^{-1} * G * w \notag \ &= \sigma_y^{-2} * \sigma_u^2 * 1 \notag \ &= \frac{\sigma_u^2}{\sigma_y^2} = h^2 (#eq:resultindexownperformance) \end{align}

Using the index weight $b$ found in \@ref(eq:resultindexownperformance) to compute the index $I$, we get

\begin{align} I &= by^* \notag \ &= h^2(y - \mu) \notag \ &= \hat{u_i} (#eq:predictedbreedingvalueownperformanceindex) \end{align}

The index value $I$ that we obtained in \@ref(eq:predictedbreedingvalueownperformanceindex) corresponds to the predicted breeding value for a given trait of an animal $i$ based on an own performance phenotypic record of animal $i$ in the respective trait. Comparing the predicted breeding value obtained in \@ref(eq:predictedbreedingvalueownperformanceindex) using selection index theory to the result obtained from the regression approach in \@ref(eq:predbreedvalueownsinglerecord) shows that they are identical.

The accuracy $r_{HI}$ of the predicted breeding value ($\hat{u_i}$) using selection index theory is computed as shown in \@ref(eq:indexaccuracy)

\begin{align} r_{HI} &= \frac{\sigma_I}{\sigma_H} \notag \ &= \frac{b \sigma_y}{\sigma_u} \notag \ &= \frac{h^2 \sigma_y}{\sigma_u} \notag \ &= h (#eq:accuracyvalueownperformanceindex)
\end{align}

Similarly to the predicted breeding value, the accuracy $r_{HI}$ that results from selection index theory is identical to what was found using the regression approach.

Example with Progeny Records {#example-progeny-record}

The prediction of breeding values for a given animal $i$ based on progeny records is very common in livestock breeding. Examples are dairy cattle where bulls are evaluated based on lactation records of daughters. Similarly for beef cattle or pigs where sires are evaluated based on carcass performance of their progeny. For a very long time this has been the standard method to predict breeding values to select parents in a breeding program. First we assume that the progeny of animal $i$ are all half-sibs. Before, we can use the performance records of the progeny to predict breeding values for the parents, we have to correct them with the appropriate mean performance. After the correction the progeny performance values are averaged for a given parent. These mean performance values for a given parent $i$ are called $\bar{y_i}$ and are used to predict the breeding values. Hence our index $I$ for a given animal $i$ is defined as

\begin{equation} I = b \bar{y_i} (#eq:indexbary) \end{equation}

Because, we are only looking at a single trait, the aggregate genotype $H$ corresponds to the single true breeding value $u$ of this trait and the economic weight $w$ is $1$. Now we are ready to set up the index normal equations. In general these equations have the form

\begin{equation} Pb = Gw (#eq:generalindexnormalequation) \end{equation}

where $P$ corresponds to the variance-covariance matrix of the information sources in the index. Our index $I$ as defined in \@ref(eq:indexbary) contains just one source of information, namely the average $\bar{y_i}$ of the progeny performance values of animal $i$. In general the phenotypic variance of the mean $\bar{y}$ of $n$ progeny performance values corresponds to

\begin{equation} \sigma_{\bar{y}}^2 = \frac{1 + (n-1)t}{n} \sigma_y^2 (#eq:phenotypicvariancemean) \end{equation}

For our case with the progeny records, $t$ takes the value of ${1\over 4}h^2$. For more details on how to compute $\sigma_{\bar{y}}^2$, see section \@ref(app-deriv-variance-mean-progeny-performance). Hence the matrix $P$ reduces to a single number

\begin{equation} P = \sigma_{\bar{y}}^2 = \frac{1 + (n-1)h^2/4}{n} \sigma_y^2 (#eq:matrixpprogeny) \end{equation}

The matrix $G$ in \@ref(eq:generalindexnormalequation) is the genetic covariance matrix between the traits in $H$ and the information sources in $I$. In our current example $G = cov(u_i, \bar{y_i}) = {1\over 2} \sigma_u^2$. For more details on how to compute $G$, see section \@ref(cov-breedingvalue-meanprogeny). Now that we have all the components of \@ref(eq:generalindexnormalequation), we can insert them and solve for $b$.

\begin{align} \frac{1 + (n-1)h^2/4}{n} \sigma_y^2 * b &= {1\over 2} \sigma_a^2 \notag \ b &= \frac{2nh^2}{4 + (n-1)h^2} \notag \ &= \frac{2n}{n + k} (#eq:indexweightprogenyaverage)
\end{align}

where $k = \frac{4-h^2}{h^2}$.

With this the predicted breeding value $\hat{u_i}$ for animal $i$ based on the average progeny performance values using the index approach corresponds to

\begin{equation} \hat{u_i} = I = b * (\bar{y_i} - \mu) = \frac{2n}{n + k} * (\bar{y_i} - \mu) (#eq:predbreedingvalueindexprogenyaverage) \end{equation}

The accuracy for the predicted breeding value in \@ref(eq:predbreedingvalueindexprogenyaverage) is

\begin{equation} r_{HI} = \sqrt{\frac{n}{n+k}} (#eq:accuracyindexprogenyaverage) \end{equation}

Appendix: Derivation of Index Normal Equations {#derivation-index-normal-equation}

In this section we want to show how to derive the index normal equations from the objective criterion in the index construction procedure. The objective criterion was formulated in equation \@ref(eq:indexconstructionobjective) as

\begin{equation} \Psi = E(H-I)^2 \rightarrow \text{ min} (#eq:appindexconstructionobjective) \end{equation}

The derivation starts by inserting the definitions of $H$ and $I$ into \@ref(eq:appindexconstructionobjective).

\begin{align} \Psi = E(H-I)^2 &= E(H^2 - 2HI + I^2) \notag \ &= E(H^2) - 2 * E(H*I) + E(I^2) (#eq:appexpectedsquarederror)
\end{align}

Both the expected value $E(H)$ of the aggregate genotype $H$ and the expected value $E(I)$ of the index are both $0$. This can be seen by the following expansion

\begin{equation} E(H) = E(w^Ta) = w^T * E(u) = w^T * 0 = 0 \end{equation}

because the breeding values $u$ are defined as deviations, there expected value $E(u)$ is always $0$. Similarly for the index $I$, we mentioned that the components in the vector $y$ denoting the information sources that enter the index $I$ are corrected by suitable population means. Due to this correction, we can state that $E(y) = 0$ and thereby $E(I) = 0$. Using these results on the expected values of $H$ and $I$, we can further develop \@ref(eq:appexpectedsquarederror)

\begin{align} \Psi &= var(H) - 2 * cov(H,I) + var(I) \notag \ &= var(w^Tu) - 2 * cov(w^Tu, b^Ty) + var(b^Ty) \notag \ &= w^T var(u) w - 2 * w^T cov(u,y^T) b + b^T var(y) b \notag \ &= w^T C w - 2 * w^T G^T b + b^T P b (#eq:appvarianceerror)
\end{align}

where $C$ is the variance-covariance matrix of the true breeding values of the traits in the aggregated genotype, $G^T$ is the genetic variance-covariance matrix between the traits in the aggregate genotype and the traits in the index and $P$ is the phenotypic variance-covariance matrix between the traits in the index. Hence we can state

\begin{align} var(H) &= w^T * C * w \notag \ cov(H,I) &= w^T * G^T * b \notag \ var(I) &= b^T * P * b (#eq:appvarHcovHIvarI) \end{align}

In the objective criterion in \@ref(eq:appindexconstructionobjective), we stated that $\Psi$ should be minimized. This is done by computing the derivative of $\Psi$ with respect to the vector $b$. The solution vector $b$ that sets that derivative to $0$ corresponds to the solution that we are looking for. The derivative of $\Psi$ with respect to the vector $b$ is also called the gradient and can be computed as

\begin{align} \frac{\partial \Psi}{\partial b} &= 0 - 2 * w^T * G^T + 2 b^T P (#eq:appgradientpsi) \end{align}

Setting \@ref(eq:appgradientpsi) to $0$ leads to

\begin{align} 0 &= - 2 * w^T * G^T + 2 b^T P \notag \ w^T G^T &= b^T P \notag \ Pb &= Gw (#eq:appminsystemeq) \end{align}

The last line in \@ref(eq:appminsystemeq) follows by transposing both sides of the second last line and because $P$ is symmetric, $P^T = P$. As a result we obtain the index normal equations which can be solved for the unknown vector $b$ by pre-multiplying both sides with the inversion matrix $P^{-1}$ of $P$.

\begin{equation} b = P^{-1}Gw (#eq:appindexweight) \end{equation}

Because $P$ is a variance-covariance matrix, it is guaranteed to be positive definite and its inverse $P^{-1}$ does exist.

Appendix: Derivation of the Index Components for the Example of the Mean Progeny Performance {#app-deriv-variance-mean-progeny-performance}

Variance of Mean Progeny Performance

The mean performance values of a group of progeny for a given parent has the following structure

\begin{equation} \bar{y_i} = {1 \over n} \sum_{k=1}^n y_{i,k} (#eq:appstructureybar) \end{equation}

where $y_k$ is the corrected performance value of progeny $k$ of animal $i$. Each $y_k$ can be decomposed into

\begin{align} y_{i,k} &= u_k + e_k \notag \ &= {1\over 2} u_i + {1\over 2} u_{d,k} + m_k + e_k (#eq:appdecomposeyk) \end{align}

The variance ($\sigma_y^2$) of a single phenotypic observation ($y_{i,k}$) of progeny $k$ of parent $i$ can be computed as

\begin{align} \sigma_y^2 = var(y_{i,k}) &= var({1\over 2} u_i + {1\over 2} u_{d,k} + m_k + e_k) \notag \ &= var({1\over 2} u_i) + var({1\over 2} u_{d,k}) + var(m_k) + var(e_k) \notag \ &= {1\over 4} var(u) + {1\over 4} var(u_{d,k}) + var(m_k) + var(e_k) \notag \ &= {1\over 4} \sigma_u^2 + {1\over 4} var(u_{d,k}) + var(m_k) + var(e_k) (#eq:appsigmay2) \end{align}

In \@ref(eq:appsigmay2) we have assumed that all the pairwise covariances between the terms are $0$. We define the intra-class correlation $t$ which is the part of the total variance which is attributed to the permanent effect in the single performance records.

\begin{equation} t = \frac{1/4 \sigma_u^2}{\sigma_y^2} = {1\over 4}h^2 \end{equation}

Inserting the decomposition of \@ref(eq:appdecomposeyk) into \@ref(eq:appstructureybar) leads to

\begin{align} \bar{y_i} &= {1 \over n} \sum_{k=1}^n y_{i,k} \notag \ &= {1 \over n} \sum_{k=1}^n ({1\over 2} u_i + {1\over 2} u_{d,k} + m_k + e_k) \notag \ &= {1\over 2} u_i + {1 \over n} \sum_{k=1}^n {1\over 2} u_{d,k} + {1 \over n} \sum_{k=1}^n m_k + {1 \over n} \sum_{k=1}^n e_k (#eq:appstructureybarinserted) \end{align}

Taking the variance on both sides of \@ref(eq:appstructureybarinserted) leads to our final result the variance ($\sigma_{\bar{y}}^2$) of the mean progeny performance.

\begin{align} \sigma_{\bar{y}}^2 = var(\bar{y_i}) &= var({1\over 2} u_i + {1 \over n} \sum_{k=1}^n {1\over 2} u_{d,k} + {1 \over n} \sum_{k=1}^n m_k + {1 \over n} \sum_{k=1}^n e_k) \notag \ &= var({1\over 2} u_i) + var({1 \over n} \sum_{k=1}^n {1\over 2} u_{d,k}) + var({1 \over n} \sum_{k=1}^n m_k) + var({1 \over n} \sum_{k=1}^n e_k) \notag \ &= {1\over 4} \sigma_u^2 + \frac{1}{4n} var(u_{d,k}) + {1 \over n} var(m_k) + {1 \over n} var(e_k) \notag \ &= {1\over 4} \sigma_u^2 + {1\over n} \left( {1\over 4}var(u_{d,k}) + var(m_k) + var(e_k) \right) \notag \ &= t * \sigma_y^2 + {1\over n}(1-t) * \sigma_y^2 \notag \ &= \frac{n*t + 1-t}{n} * \sigma_y^2 \notag \ &= \frac{1 + (n-1)t}{n} * \sigma_y^2 (#eq:appsigmaybarresult)
\end{align}

Because, we saw earlier that $t = h^2 / 4$, we can insert that into \@ref(eq:appsigmaybarresult) which brings us to the final result

\begin{equation} \sigma_{\bar{y}}^2 = \frac{1 + (n-1)h^2 / 4}{n} * \sigma_y^2 (#eq:appsigmaybarfinalresult)
\end{equation}

Covariance between True Breeding Value and Mean Progeny Performance {#cov-breedingvalue-meanprogeny}

The set-up of the index normal equations requires the matrix $G$ which corresponds to the genetic covariance between the trait in the aggregate genotype and the information sources in the index. For the example with the mean progeny performance values, the matrix $G$ is defined as

\begin{align} G = cov(u_i, \bar{y_i}) &= cov(u_i, {1\over n}\sum_{k=1}^n y_{i,k}) \notag \ &= cov \left( u_i, {1 \over 2} u_i + {1\over n}\sum_{k=1}^n \left[ {1 \over 2} u_{d,k} + m_k + e_k \right] \right) \notag \ &= cov(u_i, {1 \over 2} u_i) \notag \ &= {1 \over 2} \sigma_u^2 (#eq:appcovaybar)
\end{align}

In \@ref(eq:appcovaybar), we have used that the covariance between $u_i$ and all other components of $y_{i,k}$, except $u_i$ is $0$.



charlotte-ngs/lbgfs2020 documentation built on Dec. 20, 2020, 5:39 p.m.