knitr::opts_chunk$set(echo=TRUE)

Linear mixed models

GBLUP

Linear mixed model for breeding values:

\begin{align} \boldsymbol{y} = X \boldsymbol{\beta} + Z_a \boldsymbol{a} + \boldsymbol{\epsilon} \end{align}

where:

Distributions of the random effects:

SNPBLUP

Linear mixed model for SNP effects:

\begin{align} \boldsymbol{y} = X \boldsymbol{\beta} + Z_a Z \boldsymbol{u} + \boldsymbol{\epsilon} \end{align}

where:

Distributions of the random effects:

This model is also known as RRBLUP where "RR" stands for "ridge regression".

Conditions for equivalence

For GBLUP and SNPBLUP to be "equivalent", $\boldsymbol{a}$ and $Z \boldsymbol{u}$ must have the same probability distribution: $\boldsymbol{a} \overset{d}{=} Z \boldsymbol{u}$.

Let $M$ be the $n \times m$ matrix with $M_{ij} \in {0,1,2}$ so that it contains the number of copies of an arbitrary allele used as reference for SNP $j$ in individual $i$. More generally, if SNP genotypes are imputed, then $M_{ij} \in [0,2]$.

As a result: $\sigma_a^2 = \sigma_u^2 \, \times \, 2 \sum_j p_j(1-p_j)$

Backsolving

Useful formula

Woodbury matrix identity:

$(Q + B S C^T)^{-1} = Q^{-1} - Q^{-1} B (S^{-1} + C^T Q^{-1} B)^{-1} C^T Q^{-1}$

Computing BLUP(a) from BLUP(u)

Applying the formula for the BLUP:

$\hat{\boldsymbol{u}} := \text{BLUP}(\boldsymbol{u}) = (Z^T R^{-1} Z + \tau_u \text{I})^{-1} Z^T R^{-1} (\boldsymbol{y} - X \hat{\boldsymbol{\beta}})$

$\hat{\boldsymbol{a}} = Z \hat{\boldsymbol{u}} \, \Rightarrow \hat{\boldsymbol{a}} = Z (Z^T R^{-1} Z + \tau_u \text{I})^{-1} Z^T R^{-1} (\boldsymbol{y} - X \hat{\boldsymbol{\beta}})$

... see Strandén and Garrick (2009) ...

$\hat{\boldsymbol{a}} = (R^{-1} + \tau_a G^{-1})^{-1} R^{-1} (\boldsymbol{y} - X \hat{\boldsymbol{\beta}})$

Moving $R^{-1}$ inside the parentheses:

$\hat{\boldsymbol{a}} = (\text{I} + R \tau_a G^{-1})^{-1} (\boldsymbol{y} - X \hat{\boldsymbol{\beta}})$

And now moving $G^{-1}$ outside of the parentheses gives equation 5 of Strandén and Garrick (2009):

$\hat{\boldsymbol{a}} = G (G + R \tau_a)^{-1} (\boldsymbol{y} - X \hat{\boldsymbol{\beta}})$

Computing BLUP(u) from BLUP(a)

TODO

... see Strandén and Garrick (2009) ...

$\hat{\boldsymbol{u}} = \tau_a \sigma_u^2 Z^T (G + R \tau_a)^{-1} (\boldsymbol{y} - X \hat{\boldsymbol{\beta}})$

Hence: $\hat{\boldsymbol{u}} = \tau_a \sigma_u^2 Z^T G^{-1} \hat{\boldsymbol{a}}$. First, we fit the GBLUP model by ReML, then we compute $\hat{\boldsymbol{a}}$, and use it to compute $\hat{\boldsymbol{u}}$.

SNP-specific variances

Even though the SNPBLUP model assumes a single variance for the SNP effects, the contribution of the jth SNP to the overall genetic variance can be deduced as in Zhang et al (2010):

$\hat{\sigma}_{u,j}^2 = \hat{u}_j^2 \, 2 p_j (1 - p_j)$

GWAS

SNP effects as fixed

In a GWAS, it is more common to model the effect of each SNP as fixed, and to fit the model SNP by SNP.

For each SNP $j$: $\boldsymbol{y} = \boldsymbol{1} \mu + \boldsymbol{z}_j b_j + \boldsymbol{a} + \boldsymbol{e}$ with $\boldsymbol{a} \sim \mathcal{N}(\boldsymbol{0}, \sigma_a^2 G)$ and $\boldsymbol{e} \sim \mathcal{N}(\boldsymbol{0}, \sigma_e^2 Id)$, where $\boldsymbol{z}_j$ contains centered gene contents at SNP $j$.

Equivalently: $\boldsymbol{y} = \boldsymbol{z}_j b_j + \boldsymbol{\epsilon}$ with $\boldsymbol{\epsilon} \sim \mathcal{N}(\boldsymbol{0}, V)$ where $V = \sigma_a^2 G + \sigma_e^2 Id$.

The BLUE of $b_j$ is: $\hat{b}_j = (\boldsymbol{z}_j^T V^{-1} \boldsymbol{z}_j)^{-1} \boldsymbol{z}_j^T V^{-1} \boldsymbol{y}$ and $Var(\hat{b}_j) = (\boldsymbol{z}_j^T V^{-1} \boldsymbol{z}_j)^{-1}$.

The $t$-statistic for "$H_0: b_j = 0$" then is: $tb_j := \hat{b}_j / \sqrt{Var(\hat{b}_j)}$.

SNP effects as random

In the SNPBLUP model, the BLUPs of the SNP effects, $\hat{\boldsymbol{u}}$, as well as their variance, $Var(\hat{\boldsymbol{u}})$, can be obtained by inverting the coefficient matrix of the MMEs.

The $t$-statistic to test the null hypothesis for a given SNP, "$H_0: u_j = 0$", then is: $tu_j := \hat{u}_j / \sqrt{Var(\hat{u}_j)}$.

Equivalence

Quite remarkably, it can be shown that $tb_j = tu_j$.

TODO: suppl of Bernal Rubio et al 2016

Hence, the GBLUP-SNPBLUP model can be used to fit the data, and then a GWAS can be performed without having to fit the SNP-by-SNP model.

Caution however, it is not because the test statistics are equal that the estimates of the SNP effects are equal. Because GWAS is often used to prioritize which SNP(s) should be investigated further and functionally validated, the goal is not simply to detect them, but also to estimate their effects as accurately as possible.

Perspectives

See Maeir et al (2015) and Lu et al (2018) for the usage of backsolving in a multivariate setting.

References



timflutre/rutilstimflutre documentation built on Feb. 12, 2025, 11:35 p.m.