The package gwaR
is used to perform genome-wide association using a GBLUP animal model. It is a computationally efficient way of implementing the EMMAZ algorithm.
There are two basic functions in gwar
: gblup
fits a genomic blup model and gwas
performs the association testing. The main characteristic of this package are:
gpData
objects from package synbreed
as input data. It can also use input objects for rrblup
packageregress
to fit gblup modelsThe basic model used by gwar
is:
$$ \boldsymbol{y} = \boldsymbol{X}\cdot \boldsymbol{b}+\boldsymbol{u}g+\sum{i}{\boldsymbol{Z}_i \cdot \boldsymbol{u}_i} + \boldsymbol{e}, $$ where:
Notice that the variance-covariance matrix of the fixed effect, $\boldsymbol{A}$, can be any relationship matrix based on pedigrees or on genome-wide markers.
It can be shown that given a genotype dosage matrix $\boldsymbol{Z}g$, with animals in rows and markers in columns, the marker effects can be estimated with: $$ \boldsymbol{\widehat{g}}= \boldsymbol{Z}{g}^{'} \cdot \boldsymbol{A}^{-1} \boldsymbol{\widehat{u}_g} $$
Moreover, we have shown that the variances associated with the estimated SNP effects can be computed from the variance co-variance matrix of the BLUP of $\boldsymbol{u}_g$ :
$$ var(\widehat{g}j) = {\boldsymbol{Z_j}}{g}^{'} \cdot \boldsymbol{A}^{-1} \cdot var(\widehat{u}g) \cdot \boldsymbol{A}^{-1} \cdot {\boldsymbol{Z_j}}{g} $$
where,
$\boldsymbol{Z_j}$ is the jth column of the genotypic matrix, associated with the jth snp.
The package regress
uses the GLS equations, thus we can exploit the matrices already computed by regress
in estimating model parameters as follows. The GLS model is:
$$ \boldsymbol{y} = \boldsymbol{X}\cdot \boldsymbol{b}+ \boldsymbol{\epsilon} $$ with, $$ var(\boldsymbol{\epsilon}) = \boldsymbol{V}=\sigma_{g}^{2} \boldsymbol{A}+\sum_{i} { \sigma_{i}^{2} \cdot \boldsymbol{Z}i \cdot \boldsymbol{G_i} \cdot \boldsymbol{Z}_i^{'} } + \sigma{e}^{2} \boldsymbol{R} $$
Then, the estimated SNP effects are: $$ \boldsymbol{\widehat{g}}= \sigma_{g}^{2} \boldsymbol{Z}{g}^{'} \cdot \boldsymbol{V}^{-1} \boldsymbol{\widehat{\epsilon}} $$ and $$ \boldsymbol{\widehat{\epsilon}}=\boldsymbol{y}-\boldsymbol{X} \cdot \boldsymbol{\widehat{b}}=\boldsymbol{Q} \cdot \boldsymbol{y}=\left ( \boldsymbol{I}-\boldsymbol{X} \boldsymbol{(X^{'}V^{-1}X)^{-1})X^{'}V^{-1}} \right )\boldsymbol{y} $$ From these expression, it is easy to show: $$ var(\widehat{g}_j) ={\sigma{g}^{2}}^2 {\boldsymbol{Z_j}}{g}^{'} \cdot \boldsymbol{V}^{-1} \cdot \boldsymbol{Q} \cdot \boldsymbol{V} \cdot\boldsymbol{Q} \cdot \boldsymbol{V}^{-1} \cdot {\boldsymbol{Z_j}}{g} $$
We have shown that the resulting test statistic: $t_j = {\widehat{g_j} \over {\sqrt{var(\widehat{g_j})}}}$ is equivalent to a fixed SNP effect test under the model:
$$ \boldsymbol{y} = \boldsymbol{X}\cdot \boldsymbol{b}+{\boldsymbol{Z_j}}_{g} \cdot \widehat{g_j}+ \boldsymbol{\epsilon} $$
under the assumtion of known and fixed variance ratios estimated under the null model (without the SNP effect) $\lambda_i= {\sigma_{i}^{2} \over \sigma_{e}^{2}}, \lambda_g= {\sigma_{g}^{2} \over \sigma_{e}^{2}}$ This is: assuming that $\boldsymbol{V}$ is known up to a proportionality constant $\left ( \sigma_{e}^{2} \right )$, when it was estimated from the null model.
We will use a toy dataset from the MSUPRP population to illustrate the use of gwaR
library(gwaR) library(regress) #library(synbreed) data("MSUPRP_sample") data("Z_G") summary(MSUPRP_sample) dim(Z_thin) dim(A)
We assume a model with fixed effects: sex and carcass weight (covariate). The only random effect is the one represented by the polygenic effect with variance-covariance matrix A
design_G<-~sex+car_wt bl<-gblup(rsp="driploss",data=MSUPRP_sample,design=design_G,A,pos=c(T,T)) bl
We now perform GWA and obtain a quick summary of results. Notice we don't apply multiple test correction, but real data applications should use multiple test corrections such as bonferroni or FDR.
gw<-gwas(bl,t(Z_thin)) head(gw) summary(gw,correction = "none")
plot(gw,gpdata=MSUPRP_sample,plotlog10=T,q.qplot=T,pvalue=0.0001,correction="none") logpval<-getpvalue(gw) summary(logpval)
let's zoom into chromosomes 1 and 17
plot(gw,gpdata=MSUPRP_sample,plotlog10=T,chrom=c(1,17),pvalue=0.0001,correction="none")
gwad<-run.gwa("driploss", data=MSUPRP_sample, design=design_G, A, x=t(Z_thin),returnz = F, pos=c(T,T)) head(gwad)
This functions has many options, for example, it is possible to perform LRT to test significance of h^2 before performing the GWA scan and it allows returning the vector of SNP z-scores (instead of the estimate of SNP effects and variances). It allso can save a summary of the gblup fit to the hard drive.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.