The package gwaR is used to perform genome-wide association using a GBLUP animal model. It is a computationally efficient way of implementing the EMMAZ algorithm.

There are two basic functions in gwar: gblup fits a genomic blup model and gwas performs the association testing. The main characteristic of this package are:

Basic Model: GBLUP

The basic model used by gwar is:

$$ \boldsymbol{y} = \boldsymbol{X}\cdot \boldsymbol{b}+\boldsymbol{u}g+\sum{i}{\boldsymbol{Z}_i \cdot \boldsymbol{u}_i} + \boldsymbol{e}, $$ where:

Notice that the variance-covariance matrix of the fixed effect, $\boldsymbol{A}$, can be any relationship matrix based on pedigrees or on genome-wide markers.

GWA algorithm

It can be shown that given a genotype dosage matrix $\boldsymbol{Z}g$, with animals in rows and markers in columns, the marker effects can be estimated with: $$ \boldsymbol{\widehat{g}}= \boldsymbol{Z}{g}^{'} \cdot \boldsymbol{A}^{-1} \boldsymbol{\widehat{u}_g} $$

Moreover, we have shown that the variances associated with the estimated SNP effects can be computed from the variance co-variance matrix of the BLUP of $\boldsymbol{u}_g$ :

$$ var(\widehat{g}j) = {\boldsymbol{Z_j}}{g}^{'} \cdot \boldsymbol{A}^{-1} \cdot var(\widehat{u}g) \cdot \boldsymbol{A}^{-1} \cdot {\boldsymbol{Z_j}}{g} $$

where,

$\boldsymbol{Z_j}$ is the jth column of the genotypic matrix, associated with the jth snp.

Computation with GLS equations

The package regress uses the GLS equations, thus we can exploit the matrices already computed by regress in estimating model parameters as follows. The GLS model is:

$$ \boldsymbol{y} = \boldsymbol{X}\cdot \boldsymbol{b}+ \boldsymbol{\epsilon} $$ with, $$ var(\boldsymbol{\epsilon}) = \boldsymbol{V}=\sigma_{g}^{2} \boldsymbol{A}+\sum_{i} { \sigma_{i}^{2} \cdot \boldsymbol{Z}i \cdot \boldsymbol{G_i} \cdot \boldsymbol{Z}_i^{'} } + \sigma{e}^{2} \boldsymbol{R} $$

Then, the estimated SNP effects are: $$ \boldsymbol{\widehat{g}}= \sigma_{g}^{2} \boldsymbol{Z}{g}^{'} \cdot \boldsymbol{V}^{-1} \boldsymbol{\widehat{\epsilon}} $$ and $$ \boldsymbol{\widehat{\epsilon}}=\boldsymbol{y}-\boldsymbol{X} \cdot \boldsymbol{\widehat{b}}=\boldsymbol{Q} \cdot \boldsymbol{y}=\left ( \boldsymbol{I}-\boldsymbol{X} \boldsymbol{(X^{'}V^{-1}X)^{-1})X^{'}V^{-1}} \right )\boldsymbol{y} $$ From these expression, it is easy to show: $$ var(\widehat{g}_j) ={\sigma{g}^{2}}^2 {\boldsymbol{Z_j}}{g}^{'} \cdot \boldsymbol{V}^{-1} \cdot \boldsymbol{Q} \cdot \boldsymbol{V} \cdot\boldsymbol{Q} \cdot \boldsymbol{V}^{-1} \cdot {\boldsymbol{Z_j}}{g} $$

We have shown that the resulting test statistic: $t_j = {\widehat{g_j} \over {\sqrt{var(\widehat{g_j})}}}$ is equivalent to a fixed SNP effect test under the model:

$$ \boldsymbol{y} = \boldsymbol{X}\cdot \boldsymbol{b}+{\boldsymbol{Z_j}}_{g} \cdot \widehat{g_j}+ \boldsymbol{\epsilon} $$

under the assumtion of known and fixed variance ratios estimated under the null model (without the SNP effect) $\lambda_i= {\sigma_{i}^{2} \over \sigma_{e}^{2}}, \lambda_g= {\sigma_{g}^{2} \over \sigma_{e}^{2}}$ This is: assuming that $\boldsymbol{V}$ is known up to a proportionality constant $\left ( \sigma_{e}^{2} \right )$, when it was estimated from the null model.

Example

We will use a toy dataset from the MSUPRP population to illustrate the use of gwaR

Load packages and data

library(gwaR)
library(regress)
#library(synbreed)
data("MSUPRP_sample")
data("Z_G")

summary(MSUPRP_sample)
dim(Z_thin)
dim(A)

Fit GBLUP model

We assume a model with fixed effects: sex and carcass weight (covariate). The only random effect is the one represented by the polygenic effect with variance-covariance matrix A

design_G<-~sex+car_wt
bl<-gblup(rsp="driploss",data=MSUPRP_sample,design=design_G,A,pos=c(T,T))
bl

GWA from GBLUP

We now perform GWA and obtain a quick summary of results. Notice we don't apply multiple test correction, but real data applications should use multiple test corrections such as bonferroni or FDR.

gw<-gwas(bl,t(Z_thin))
head(gw)
summary(gw,correction = "none")

graphics and pvalues

plot(gw,gpdata=MSUPRP_sample,plotlog10=T,q.qplot=T,pvalue=0.0001,correction="none")
logpval<-getpvalue(gw)
summary(logpval)

let's zoom into chromosomes 1 and 17

plot(gw,gpdata=MSUPRP_sample,plotlog10=T,chrom=c(1,17),pvalue=0.0001,correction="none")

simultaneous GWA and GBLUP

gwad<-run.gwa("driploss", data=MSUPRP_sample, design=design_G, A, x=t(Z_thin),returnz = F, pos=c(T,T))
head(gwad)

This functions has many options, for example, it is possible to perform LRT to test significance of h^2 before performing the GWA scan and it allows returning the vector of SNP z-scores (instead of the estimate of SNP effects and variances). It allso can save a summary of the gblup fit to the hard drive.



steibelj/gwaR documentation built on May 30, 2019, 10:45 a.m.