In charlotte-ngs/lbgfs2020: Livestock Breeding and Genomics (Fall 2020)

Genetic Evaluations {#gen-eval}

s_this_rmd_file <- rmdhelp::get_this_rmd_file()
mrmt$set_this_rmd_file(ps_this_rmd_file = s_this_rmd_file)

In chapter \@ref(quan-gen), we have already seen that the breeding value is a really important concept. The Definition \@ref(def:defbreedingvalue) of the term breeding value has some important consequences.

The breeding value is based on the average of a large number of offspring. This is necessary, because offspring inherit a random sample of a parents alleles. But not all offspring receive the same sample of alleles. Taking the average of a large number of offspring reduces the effect of sampling and thereby lets the breeding value converge to a stable value.
The breeding value is defined as a deviation from the population mean. This population mean depends on allele frequencies which are specific for each population. Therefore breeding values can only be compared within one population.
Because the breeding value is defined as a deviation, the expected value of the breeding values and the mean of all breeding values are $0$ by definition.

Introduction {#intro-predict-breeding-value}

Because, in the more traditional setting^[That means, at this moment, we are ignoring all recent developments made such as genomic selection.] of livestock breeding, we do not have information about allele frequencies and about genotypic values, we have to predict breeding values. For this prediction we can use different sources of information. Currently, we are assuming that this information is all based on records of phenotypic observations.

The Basic Model {#basic-model}

Although, the phenotypic observation might originate from different sources, we can use one basic model for all of the breeding value predictions. We have already seen a different form of this model in equation \@ref(eq:phengenenv) in section \@ref(geno-pheno). The original model from equation \@ref(eq:phengenenv) is modified and extended to the model shown below.

\begin{equation} y_{ij} = \mu_i + g_i + e_{ij} (#eq:breedvalpredbasicmodel) \end{equation}

\begin{tabular}{llp{9cm}} where & & \ & $y_{ij}$ & $j^{th}$ record of animal $i$ \ & $\mu_i$ & identifiable fixed environmental effect \ & $g_i$ & sum of all additive ($u$), dominance ($d$) and epistatic effects of the genotype of animal $i$ \ & $e_{ij}$ & random environmental effects of animal $i$ \end{tabular}

Livestock species are mostly diploid and hence from a given parent only one allele of a given locus is passed to a gamete which can later be found in the parents offspring. Any interaction effects caused by dominance or epistasis are not preserved from parent to offspring. Only the additive effect of a given allele is passed from parent to offspring. The additive genetic part ($u_i$) of $g_i$ in equation \@ref(eq:breedvalpredbasicmodel) represents the average genetic effect that animal $i$ receives from its parents. It is therefore called the breeding value. Because the additive genetic effect is a function of the alleles passed from the parents to the progeny, it is the only component that can be selected for and is therefore the main component of interest from a livestock breeding perspective. Due to the major interest in the genetic additive component, the terms in the basic model in \@ref(eq:breedvalpredbasicmodel) are re-arranged as follows.

\begin{equation} y_{ij} = \mu_i + u_i + e_{ij}^* (#eq:breedvalpredmodifiedmodel) \end{equation}

\begin{tabular}{llp{9cm}} where & & \ & $y_{ij}$ & $j^{th}$ record of animal $i$ \ & $\mu_i$ & identifiable fixed environmental effect \ & $u_i$ & sum of all additive ($u$) genetic effects of the genotype of animal $i$ \ & $e_{ij}^*$ & dominance, epistatic and random environmental effects of the $j^{th}$ record of animal $i$ \end{tabular}

The same re-arrangement of terms in the basic model is illustrated by Figure \@ref(fig:basicmodelrearrterm)

#rmddochelper::use_odg_graphic(ps_path = "odg/basicmodelrearrterm.odg")
knitr::include_graphics(path = "odg/basicmodelrearrterm.png")

Equation \@ref(eq:breedvalpredmodifiedmodel) constitutes the linear model that forms the basis for most problems of breeding value prediction in livestock breeding. Usually it is assumed that the phenotypic observations $y_{ij}$ follow a multivariate normal distribution. We have already seen in section \@ref(genetic-models) that the additive genetic effect ($u_i$) is thought to be the sum of a large number of unlinked loci that all contribute a very small amount to the total breeding value. Then by the central limit theorem it follows that $u_i$ converges to a normal distribution. By the same reasoning that the environmental effect $e_{ij}^$ is composed of very many small contributions, also $e_{ij}^$ converges to a normal distribution. From distribution theory it is known that the sum of two normally distributed random variables (like $u_i$ and $e_{ij}^$) plus a fixed term (like $\mu$) is again a random variable that follows a normal distribution. We can conclude that the assumption that all the random effects ($y_{ij}$, $u_i$ and $e_{ij}^$) in model \@ref(eq:breedvalpredmodifiedmodel) is consistent with distribution theory. Furthermore the central limit theorem implies that in principle the number of breeding values from single loci tends to infinity. That means the total breeding value $u_i$ corresponds to a sum of infinitely many contributions. Based on the fact that in theory $u_i$ is composed of an infinite number of infinitely small components, the model in \@ref(eq:breedvalpredmodifiedmodel) is called the infinitesimal model.

Concerning the variances, it is assumed that $var(y_{ij})$, $var(u_i)$ and $var(e_{ij})$ are all known. Covariances ($cov(u_i, e_{ij})$) between genetic and environmental effects and covariances ($cov(e_{ij}^, e_{kl}^)$) between environmental effects of mates $i$ and $k$ are assumed to be zero, respectively.

Also $\mu_i$ which is used to represent the mean performance of animals in the same identifiable environment such as herd or management group or have the same sex or age, is assumed to be known.

Decomposition of Breeding Value

As already mentioned earlier, the breeding value $u_i$ of an individual $i$ represents the average additive genetic effect that animal $i$ receives from its parents $s$ and $d$. Hence $u_i$ can be decomposed into

\begin{equation} u_i = {1\over 2} u_s + {1\over 2} u_d + m_i (#eq:breedingvalueanimalparent) \end{equation}

where $u_s$ and $u_d$ correspond to the breeding values of parents $s$ and $d$, respectively and $m_i$ is the deviation of $u_i$ from the average breeding values of the parents and is called Mendelian sampling. The term $m_i$ is necessary, because two fullsibs $i$ and $k$ both having parents $s$ and $d$ receive different random samples of the set of parental alleles. Hence the breeding values $u_i$ and $u_k$ of fullsibs $i$ and $k$ are not going to be the same. The difference between breeding values $u_i$ and $u_k$ is reflected in the different Mendelian sampling terms $m_i$ and $m_k$ for fullsibs $i$ and $k$.

Basic Principle of Predicting Breeding Values {#principle-predic-breedingvalue}

The prediction of breeding values mostly follows the same principles. From the point of view of statistics, estimations or predictions are always a function of the observed data. When looking at the model in \@ref(eq:breedvalpredmodifiedmodel), we can probably guess that the observed phenotypic records ($y_{ij}$) must be corrected somehow for the identifiable environmental effects represented by $\mu_i$. The second influence that we want to consider when predicting breeding values is how "closely related" the observed record $y_{ij}$ is to the breeding value. For traits where the influence of the genetic component is not very strong, it is probably a good idea to down-weigh the information from $y_{ij}$.

The two principles just described can be generalized as follows. Breeding values are predicted according to the following two steps.

Observations are corrected for the mean performance values of animals under the same environmental conditions. The conditions are described by the effects captured in $\mu_i$.
The corrected observations are weighted by a factor that reflects the amount of information that is available for the prediction of an animals breeding value.

In what follows, we have a look at how breeding values are predicted from different sources of information.

Animal's Own Performance {#own-performance}

Single Record {#single-record}

When one phenotypic observation per animal is the only information we have available, the predictor $\hat{u_i}$ of the breeding value $u_i$ of animal $i$ can be derived according to the following line of argument. Let us assume for a moment that we know the true breeding value $u_i$ for a population of animals. In addition to that each animal $i$ has one observation $y_i$ available. Then we plot the values of $u_i$ against the values of $y_i$ for the complete population.

#rmddochelper::use_odg_graphic(ps_path = "odg/regbreedingvaluesinglerecord.odg")
knitr::include_graphics(path = "odg/regbreedingvaluesinglerecord.png")

The plot in Figure \@ref(fig:regbreedingvaluesinglerecord) suggests that we fit a regression of the breeding values onto the phenotypic records. The fitted regression is represented by the red line. Hence as soon as we can draw the regression line, we can predict breeding values based on the phenotypic observations. The predicted breeding value $\hat{u_i}$ for a given $y_i$ corresponds to the value on the red line corresponding to the value of $y_i$. The slope of the regression line corresponds to the regression coefficient $b$. From regression theory, the coefficient $b$ is computed as

\begin{align} b &= \frac{cov(u,y)}{var(y)} \notag \ &= \frac{cov(u,\mu + u + e)}{var(y)} \notag \ &= \frac{cov(u,u)}{var(y)} \notag \ &= \frac{var(u)}{var(y)} \notag \ &= h^2 (#eq:regcoeffaony) \end{align}

where $h^2$ corresponds to the ratio between the genetic additive and the phenotypic variance and is called heritability. We are using the regression coefficient to predict the breeding value for animal $i$ based on a single record $y_i$.

\begin{align} \hat{u_i} &= b * (y_i - \mu) \notag \ &= h^2 * (y_i - \mu) (#eq:predbreedvalueownsinglerecord) \end{align}

From that it follows that the predicted breeding value for an animal based on a single own performance record corresponds to the observation corrected for the general mean $\mu$ times the heritability. The correlation between the selection criterion, in our case the phenotypic record and the true breeding value is known as the accuracy of the prediction. It provides a means of evaluating the different selection criteria. The higher the correlation between selection criterion and breeding value, the better is the prediction. Sometimes the accuracy of evaluation is reported in terms of reliability ($r^2$) which corresponds to the squared correlation between selection criterion and true breeding value. With a single own performance record per animal, the correlation is

\begin{align} r_{u,y} &= \frac{cov(u, y)}{\sigma_u \ \sigma_y} \notag \ &= \frac{\sigma_u^2}{\sigma_u \ \sigma_y} \notag \ &= \frac{\sigma_u}{\sigma_y} \notag \ &= h (#eq:corownperftruebreedingvalue) \end{align}

An alternative way to assess the quality of the breeding value prediction is to compute the variance of the predicted breeding values.

\begin{align} var(\hat{u_i}) &= var(by) = var(h^2y) \notag \ &= h^4 var(y) \notag \ &= r_{u,y}^2 h^2 \sigma_y^2 \notag \ &= r_{u,y}^2 \sigma_a^2 (#eq:varpredictedbreedingvalue) \end{align}

Hence the variance of the predicted breeding values corresponds to the product of the reliability times the genetic additive variance. The expected response ($R$) to selection on the basis of one record per animal is

\begin{equation} R = i * r_{u,y}^2 * \sigma_y = i * h^2 * \sigma_y (#eq:selectionresponseownperformance) \end{equation}

where $i$, the selection intensity refers to the superiority of selected individuals above population mean expressed in phenotypic standard deviation.

Repeated Records {#repeatedrecords}

When animals get older, it is likely that we can observe multiple measurements for the same trait. An example is milk yield in dairy cows where a cow might have repeated lactation records. The breeding value of an animal may be predicted based on the mean of the repeated records. With repeated records, an additional resemblance between the records of an animal due to permanent environmental factors occurs. The between-animal variance is partly genetic and partly caused by permanent environmental effects. The within-animal variance is attributed to differences between the successive measurements of the animal arising from temporary environmental variations, i.e., from environmental factors that change over time. The variance of observations ($var(y)$) can therefore be partitioned as

\begin{equation} var(y) = var(u) + var(pe) + var(te) (#eq:varrepeatedrecords) \end{equation}

where $var(u)$ is the genetic additive variance, $var(pe)$ the variance due to permanent environmental effects and $var(te)$ the variance due to temporary environmental effects.

The intra-class correlation $t$ is defined as the ratio of the genetic plus the permanent environmental variance divided by the phenotypic variance.

\begin{equation} t = \frac{var(u) + var(pe)}{var(y)} (#eq:intraclasscorrelation) \end{equation}

The term $t$ is also called repeatability and it measures the correlation between the records of an individual. From \@ref(eq:intraclasscorrelation) it follows that

\begin{equation} 1-t = \frac{var(te)}{var(y)} (#eq:oneminusintraclasscorrelation) \end{equation}

With this model, it is assumed that the repeated records on the animal are the same trait. Therefore the genetic correlation between all pairs of records is one. We also assume that all records have equal variance and that the environmental correlations between all pairs of records are equal. Let $\tilde{y}$ represent the mean of $n$ records on animal $i$ which means

\begin{align} \tilde{y_i} &= {1\over n} \sum_{k=1}^n y_{ik} \notag \ &= {1\over n} \sum_{k=1}^n ( \mu + u_i + pe_i + te_{ik}) \notag \ &= \mu + u_i + pe_i + \sum_{k=1}^n te_{ik} (#eq:meanrepeatedrecords) \end{align}

In this case, we use the mean ($\tilde{y_i}$) to predict the breeding value ($\hat{u_i}$)

\begin{equation} \hat{u_i} = b(\tilde{y_i} - \mu) (#eq:predbreedingvaluerepeatedrecords) \end{equation}

where

\begin{equation} b = \frac{cov(u,\tilde{y})}{var(\tilde{y})} (#eq:regcoeffrepeatedrecords) \end{equation}

The single elements are computed as

\begin{equation} cov(u,\tilde{y}) = cov(u, \mu + u + pe + {1\over n} \sum_{k=1}^n te_k) = var(u) = \sigma_u^2 (#eq:covayrepeatedrecords) \end{equation}

and

\begin{equation} var(\tilde{y}) = var(u) + var(pe) + {1\over n} var(te) (#eq:varyrepeatedrecords) \end{equation}

Expressing \@ref(eq:varyrepeatedrecords) in terms of \@ref(eq:intraclasscorrelation) and \@ref(eq:oneminusintraclasscorrelation) leads to

\begin{align} var(\tilde{y}) &= t * \sigma_y^2 + {1\over n} (1-t) * \sigma_y^2 \notag \ &= {1\over n}\left( n*t + (1-t) \right) \sigma_y^2 \notag \ &= \frac{1 + (n-1)t}{n} \sigma_y^2 (#eq:varyastrepeatedrecords) \end{align}

Inserting this into \@ref(eq:regcoeffrepeatedrecords) results in

\begin{align} b &= \frac{cov(u,\tilde{y})}{var(\tilde{y})} \notag \ &= \frac{n \sigma_u^2}{(1 + (n-1)t) \sigma_y^2} \notag \ &= \frac{nh^2}{1 + (n-1)t} (#eq:regcoeffrepeatedrecordsresult) \end{align}

When we predict the breeding value $u_i$ of animal $i$ using repeated records, the regression coefficient $b$ depends on

the heritability ($h^2$)
the repeatability ($t$) and
the number ($n$) of repeated records per animal

The difference between repeated records of an animal is assumed to be due to temporary environmental differences between successive performances. However, if successive records are known to be affected by factors which influence performance, these must be corrected for. For instance, differences in age at calving in first and second lactations may influence milk yield in first and second lactation. Such age differences should be adjusted for before using the means of both lactations for breeding value prediction.

The accuracies of the predicted breeding value using repeated records is

\begin{align} r_{u,\tilde{y}} &= \frac{cov(u,\tilde{y}) }{\sigma_u \sigma_y} \notag \ &= \frac{\sigma_u^2}{\sigma_u \sqrt{(1 + (n-1)t)/n \sigma_y^2}} \notag \ &= h \sqrt{n/(1 + (n-1)t)} \notag \ &= \sqrt{nh^2/(1 + (n-1)t)} \notag \ &= \sqrt{b} (#eq:accrepeatedrecords) \end{align}

The expected response to selection using repeated records will be covered in an exercise.

Progeny Records {#progenyrecords}

For traits that are recorded only on female animals, the prediction of breeding values for male animals (sires) is usually based on the mean of their female progeny. This is typical in dairy cattle, where bulls are evaluated on the basis of their daughters. Let $\bar{y_i}$ be the mean of single records of $n$ daughters of sire $i$ with the assumption that the daughters are only related through the sire (paternal half-sibs), the predicted breeding value of sire i can then be computed as

\begin{equation} \hat{u_i} = b * (\bar{y_i} - \mu) (#eq:predictedbreedingvalueprogeny) \end{equation}

where

\begin{equation} b = \frac{cov(u, \bar{y})}{var(\bar{y})} (#eq:regressioncoefficientprogeny) \end{equation}

and

\begin{align} \bar{y} &= {1 \over n} \sum_{k=1}^n y_k \notag \ &= {1 \over n} \sum_{k=1}^n (\mu + u_k + e_k) \notag \ &= \mu + {1 \over n} \sum_{k=1}^n (u_k + e_k) \notag \ &= \mu + {1 \over n} \sum_{k=1}^n (1/2 u_s + 1/2 u_{dk} + m_k + e_k) \notag \ &= \mu + 1/2 u_s + {1 \over n} \sum_{k=1}^n (1/2 u_{dk} + m_k + e_k) \notag \ &= \mu + 1/2 u_s + {1 \over n} \sum_{k=1}^n 1/2 u_{dk} + {1 \over n} \sum_{k=1}^n e_k \end{align}

In the current case of using progeny records to predict a breeding value, we have

\begin{align} cov(u, \bar{y}) &= cov(u, {1\over 2}u_s + {1\over 2}{1\over n}\sum_{k=1}^n u_{d,k} + {1\over n}\sum_{k=1}^n m_k + {1\over n}\sum_{k=1}^n e_k) \notag \ &= cov(u, {1\over 2}u_s) \notag \ &= {1\over 2} cov(u, u_s) \notag \ &= {1\over 2} \sigma_u^2 \end{align}

where $u_s$ and $u_{d,k}$ denote the breeding values of sire $s$ and dam $d$ of offspring $k$, respectively and $m_k$ and $e_k$ stand for the mendelian sampling and the environmental effect of daughter $k$. Using the same principles as in section \@ref(repeatedrecords), we get

\begin{equation} var(\bar{y}) = (t + (1-t)/n) \sigma_y^2 (#eq:variancedaughtermean) \end{equation}

where $\sigma_y^2 = var(u) + var(e) = \sigma_u^2 + \sigma_e^2$.

Assuming there is no environmental covariance between half-sib records and the intra-class correlation $t$ is $\frac{1/4 \sigma_u^2}{\sigma_y^2}$. Then we can compute the regression coefficient as

\begin{align} b &= \frac{1/2 \sigma_u^2}{(t + (1-t)/n) \sigma_y^2} \notag \ &= \frac{1/2 h^2 \sigma_y^2}{({1\over 4}h^2 + (1 - {1\over 4}h^2)/n) \sigma_y^2} \notag \ &= \frac{2nh^2}{nh^2 + (4-h^2)} \notag \ &= \frac{2n}{n + (4-h^2)/h^2} \notag \ &= \frac{2n}{n+k} (#eq:regcoeffprogeny) \end{align}

with $k=\frac{4-h^2}{h^2}$.

The term $k$ is constant for any assumed heritability ($h^2$). The regression coefficient ($b$) depends on the heritability and number of progeny and converges towards a limit of $2$ as the number of daughters increases.

The accuracy of the estimated breeding value is

\begin{align} r_{u, \bar{y}} &= \frac{cov(u, \bar{y})}{\sqrt{var(u) var(\bar{y})}} \notag \ &= \frac{1/2 h^2 \sigma_y^2}{\sqrt{h^2 \sigma_y^2 ({1\over 4}h^2 + (1 - {1\over 4}h^2)/n) \sigma_y^2}} \notag \ &= \frac{1/2 h}{\sqrt{{1\over 4}h^2 + (1 - {1\over 4}h^2)/n}} \notag \ &= \sqrt{\frac{nh^2}{nh^2 + (4-h^2)}} \notag \ &= \sqrt{\frac{n}{n + k}} (#eq:accuracyprogeny) \end{align}

The term for $r_{u, \bar{y}}$ in \@ref(eq:accuracyprogeny) approaches $1$ as the number of progeny increases, assuming $k$ is constant. The reliability ($r_{u, \bar{y}}^2$) of the predicted breeding value is $n/(n+k)$ and corresponds to $1/2 * b$ computed in \@ref(eq:regcoeffprogeny).

charlotte-ngs/lbgfs2020 documentation built on Dec. 20, 2020, 5:39 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com