In charlotte-ngs/lbgfs2020: Livestock Breeding and Genomics (Fall 2020)

knitr::opts_chunk$set(echo = FALSE, results = "asis")
knitr::knit_hooks$set(hook_convert_odg = rmdhelp::hook_convert_odg)

Why

Predictions of breeding values using BLUP requires variance components $\sigma_u^2$ or $\sigma_s^2$ and $\sigma_e^2$
So far we have assumed that they are known
In reality: must be estimated from data

Sire Model

Start with a simple sire model

\begin{equation} y = X\beta + Z_ss + e \notag \end{equation}

with $var(e) = R$, $var(s) = A_s \sigma_s^2$ and $var(y) = Z_sA_sZ_s^T \sigma_s^2 + R$

$A_s$: numerator relationship for sires
$\sigma_s^2$ corresponds to $0.25 * \sigma_u^2$
$R = I * \sigma_e^2$

$\rightarrow$ estimate $\sigma_s^2$ and $\sigma_e^2$ from data

Analysis of Variance (ANOVA)

\tiny

\begin{tabular}{lll} \hline \ Source & Degrees of Freedom ($df$) & Sums of Squares ($SSQ$) \ \hline \ Overall ($\mu$) & $Rank(X)=1$ & $y^TX(X^TX)^{-1}X^Ty = F$ \ Sires ($s$) & $Rank(Z_s) - Rank(X) = q - 1$ & $y^TZ_s(Z_s^TZ_s)^{-1}Z_s^Ty - y^TX(X^TX)^{-1}X^Ty = S$ \ Residual ($e$) & $n - Rank(Z_s) = n - q$ & $y^Ty - y^TZ_s(Z_s^TZ_s)^{-1}Z_s^Ty = R$ \ \hline \ Total & $n$ & $y^Ty$ \ \hline \end{tabular} \normalsize

Sums of Squares

$$F = y^TX(X^TX)^{-1}X^Ty = {1\over n} \left[\sum_{i=1}^n y_i \right]^2$$

$$S= y^TZ_s(Z_s^TZ_s)^{-1}Z_s^Ty - y^TX(X^TX)^{-1}X^Ty = \sum_{i=1}^{q} {1 \over n_i} \left[\sum_{j=1}^{n_i} y_{ij}\right]^2 - F $$

$$R = y^Ty - y^TZ_s(Z_s^TZ_s)^{-1}Z_s^Ty = \sum_{i=1}^n y_i^2 - S - F$$

Estimates

$\beta$ and $s$ fixed
Estimates of $\sigma_e^2$ and $\sigma_s^2$ are based on observed sums of squares $S$ and $R$
Set their expected values equal to the observed sums of squares

$$E(R) = (n-q) \sigma_e^2$$

$$E(S) = (q-1) \sigma_e^2 + tr(Z_sMZ_s)\sigma_s^2$$ where $M = I - X(X^TX)^{-1}X^T$ and $q$ is the number of sires.

$\rightarrow$ $\widehat{\sigma_e^2} = \frac{R}{n-q}$ and $\widehat{\sigma_s^2} = \frac{S - (q-1)\widehat{\sigma_e^2}}{tr(Z_sMZ_s)}$

Numerical Example

tbl_num_ex_chp12 <- tibble::tibble( Animal = c(4, 5, 6, 7),
                                        Sire   = c(2, 1, 3, 2),
                                        WWG    =  c(2.9, 4.0, 3.5, 3.5) )

knitr::kable(tbl_num_ex_chp12,
             booktabs  = TRUE,
             longtable = TRUE,
             caption   = "Small Example Dataset for Variance Components Estimation Using a Sire Model")

Model

$$y_{ij} = \mu + s_j + e_i$$

Design Matrices

n_nr_obs_p02 <- nrow(tbl_num_ex_chp12)
### # design matrix X
mat_x_p02 <- matrix(1, nrow = n_nr_obs_p02, ncol = 1)
### # design matrix Z
mat_z_p02 <- matrix(c(0, 1, 0,
                      1, 0, 0,
                      0, 0, 1,
                      0, 1, 0), nrow = n_nr_obs_p02, byrow = TRUE)
n_nr_sire <- ncol(mat_z_p02)
### # Observations
mat_obs <- matrix(tbl_num_ex_chp12$WWG, ncol = 1)

cat("$$\n")
cat(paste(rmdhelp::bmatrix(pmat = mat_x_p02, ps_name = 'X'), collapse = '\n'), '\n')
cat("\\text{, }")
cat(paste(rmdhelp::bmatrix(pmat = mat_z_p02, ps_name = 'Z_s'), collapse = '\n'), '\n')
cat("$$\n")

ANOVA

### # compute F
ytx <- crossprod(mat_obs,mat_x_p02)
xtx <- crossprod(mat_x_p02)
ssqf <- ytx %*% solve(xtx) %*% t(ytx)
### # compute S
ytz <- crossprod(mat_obs, mat_z_p02)
ztz <- crossprod(mat_z_p02)
ssqs <- ytz %*% solve(ztz) %*% t(ytz) - ssqf
### # compute R
yty <- crossprod(mat_obs)
ssqr <- yty - ssqs - ssqf

An analysis of variance can be constructed as

\begin{center} \begin{tabular}{lll} \hline \ Source & Degrees of Freedom ($df$) & Sums of Squares ($SSQ$) \ \hline \ Overall ($\mu$) & $Rank(X)=1$ & $F = r ssqf$ \ Sires ($s$) & $Rank(Z_s) - Rank(X) = q - 1$ & $S = r ssqs$ \ Residual ($e$) & $n - Rank(Z_s) = n - q$ & $R = r ssqr$ \ \hline \ \end{tabular} \end{center}

Estimates

mat_m <- diag(n_nr_obs_p02) - mat_x_p02 %*% solve(xtx) %*% t(mat_x_p02)
ztmz <- t(mat_z_p02) %*% mat_m %*% mat_z_p02
tr_ztmz <- sum(diag(ztmz))
cat(paste(rmdhelp::bmatrix(pmat = mat_m, ps_name = 'M', ps_env = '$$'), collapse = '\n'), '\n')
cat(paste(rmdhelp::bmatrix(pmat = ztmz, ps_name = 'Z_s^TMZ_s', ps_env = '$$'), collapse = '\n'), '\n')

Results

hat_sigmae2 <- ssqr
hat_sigmas2 <- (ssqs - (n_nr_sire-1) * hat_sigmae2) / tr_ztmz

$$\hat{\sigma_e^2} = R = r hat_sigmae2$$ $$\hat{\sigma_s^2} = \frac{S - (q-1)\hat{\sigma_e^2}}{tr(Z_s^TMZ_s)} = \frac{r ssqs - r n_nr_sire-1 * r hat_sigmae2}{r tr_ztmz} = r hat_sigmas2$$

Anova in R

Assume dataset is stored in dataframe called tbl_num_ex_chp12

tbl_num_ex_chp12$Sire <- as.factor(tbl_num_ex_chp12$Sire)
aov_result <- aov(WWG ~ Sire, data = tbl_num_ex_chp12)
summary(aov_result)

Likelihood

Definition of likelihood

$$L(\mu, \Sigma) = f(y | \mu, \Sigma)$$

with

$$ f_Y(y| \mu, \Sigma) = \frac{1}{\sqrt{(2\pi)^n det(\Sigma)}} exp \left{-{1\over 2}(y - \mu)^T \Sigma^{-1}(y - \mu) \right} $$

Maximum Likelihood

Maximize $L(\mu, \Sigma)$ with respect to $\Sigma$

$$\hat{\Sigma} = argmax_{\Sigma} L(\mu, \Sigma)$$

Bayesian Approach

Estimates of unknown quantity $\Sigma$ based on posterior distribution of unknowns given knowns
Using Bayes Theorem:

\begin{align} f(\Sigma | y) & = \frac{f(\Sigma, y)}{f(y)} \notag \ & = \frac{f(y | \Sigma)f(\Sigma)}{f(y)} \notag \ & \propto f(y | \Sigma)f(\Sigma) \notag \end{align}

where $f(\Sigma)$: prior distribution and $f(y|\Sigma)$: likelihood

Bayesian Estimates

Fixed Linear Model with $\Sigma = \left[\begin{array}{c} \sigma_s^2 \ \sigma_e^2 \end{array} \right]$
Full conditional distributions
- sire variance: $f(\sigma_s^2 | \sigma_e^2, y)$ has a given standard distribution
- residual variance: $f(\sigma_e^2 | \sigma_s^2, y)$ has a given standard distribution
Draw random numbers from full conditional distributions in turn
Result will be samples from posterior distribution
Estimates are computed as empirical means and standard deviation based on the samples, e.g for $\sigma_s^2$

$$\widehat{\sigma_s^2}{Bayes} = {1 \over N}\sum{t=1}^N (\sigma_s^2)^{(t)}$$

charlotte-ngs/lbgfs2020 documentation built on Dec. 20, 2020, 5:39 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

charlotte-ngs/lbgfs2020
Livestock Breeding and Genomics (Fall 2020)

In charlotte-ngs/lbgfs2020: Livestock Breeding and Genomics (Fall 2020)

Why

Sire Model

Analysis of Variance (ANOVA)

Sums of Squares

Estimates

Numerical Example

Design Matrices

ANOVA

Estimates

Results

Anova in R

Likelihood

Maximum Likelihood

Bayesian Approach

Bayesian Estimates

R Package Documentation

Browse R Packages

We want your feedback!

charlotte-ngs/lbgfs2020 Livestock Breeding and Genomics (Fall 2020)

In charlotte-ngs/lbgfs2020: Livestock Breeding and Genomics (Fall 2020)

Why

Sire Model

Analysis of Variance (ANOVA)

Sums of Squares

Estimates

Numerical Example

Design Matrices

ANOVA

Estimates

Results

Anova in R

Likelihood

Maximum Likelihood

Bayesian Approach

Bayesian Estimates

R Package Documentation

Browse R Packages

We want your feedback!

charlotte-ngs/lbgfs2020
Livestock Breeding and Genomics (Fall 2020)