In steibelj/gwaR: Functions for performing GWA from GBLUP

The package gwaR is used to perform genome-wide association using a GBLUP animal model. It is a computationally efficient way of implementing the EMMAZ algorithm.

There are two basic functions in gwar: gblup fits a genomic blup model and gwas performs the association testing. The main characteristic of this package are:

Accepts gpData objects from package synbreed as input data. It can also use input objects for rrblup package
Uses REML engine from package regress to fit gblup models
It accepts any general linear model specification for fixed and random effects
It implements proprietary code to estimate SNP effects and to perform tests for GWA
It is computationally efficient: it can solve systems of equations with 1000s or records and 1000000s of markers in a few minutes/hours

Basic Model: GBLUP

The basic model used by gwar is:

$$ \boldsymbol{y} = \boldsymbol{X}\cdot \boldsymbol{b}+\boldsymbol{u}g+\sum{i}{\boldsymbol{Z}_i \cdot \boldsymbol{u}_i} + \boldsymbol{e}, $$ where:

$\boldsymbol{y}$ is the vector of response variables
$\boldsymbol{X}\cdot \boldsymbol{b}$ models the fixed effects
$\boldsymbol{u}g$ is the "genetic" effect that will be used to perform GWA, assumed $\boldsymbol{u}_g \sim N\left ( \boldsymbol{0},\sigma{g}^{2} \boldsymbol{A} \right )$
$sum_{i}{\boldsymbol{Z}i \cdot \boldsymbol{u}_i}$ are other random effects, $\boldsymbol{u}_i \sim N\left ( \boldsymbol{0},\sigma{i}^{2} \boldsymbol{G_i} \right )$
$\boldsymbol{e}$ are the vector of effects residual, $\boldsymbol{e} \sim N\left ( \boldsymbol{0},\sigma_{e}^{2} \boldsymbol{R} \right )$

Notice that the variance-covariance matrix of the fixed effect, $\boldsymbol{A}$, can be any relationship matrix based on pedigrees or on genome-wide markers.

GWA algorithm

It can be shown that given a genotype dosage matrix $\boldsymbol{Z}g$, with animals in rows and markers in columns, the marker effects can be estimated with: $$ \boldsymbol{\widehat{g}}= \boldsymbol{Z}{g}^{'} \cdot \boldsymbol{A}^{-1} \boldsymbol{\widehat{u}_g} $$

Moreover, we have shown that the variances associated with the estimated SNP effects can be computed from the variance co-variance matrix of the BLUP of $\boldsymbol{u}_g$ :

$$ var(\widehat{g}j) = {\boldsymbol{Z_j}}{g}^{'} \cdot \boldsymbol{A}^{-1} \cdot var(\widehat{u}g) \cdot \boldsymbol{A}^{-1} \cdot {\boldsymbol{Z_j}}{g} $$

where,

$\boldsymbol{Z_j}$ is the jth column of the genotypic matrix, associated with the jth snp.

Computation with GLS equations

The package regress uses the GLS equations, thus we can exploit the matrices already computed by regress in estimating model parameters as follows. The GLS model is:

$$ \boldsymbol{y} = \boldsymbol{X}\cdot \boldsymbol{b}+ \boldsymbol{\epsilon} $$ with, $$ var(\boldsymbol{\epsilon}) = \boldsymbol{V}=\sigma_{g}^{2} \boldsymbol{A}+\sum_{i} { \sigma_{i}^{2} \cdot \boldsymbol{Z}i \cdot \boldsymbol{G_i} \cdot \boldsymbol{Z}_i^{'} } + \sigma{e}^{2} \boldsymbol{R} $$

Then, the estimated SNP effects are: $$ \boldsymbol{\widehat{g}}= \sigma_{g}^{2} \boldsymbol{Z}{g}^{'} \cdot \boldsymbol{V}^{-1} \boldsymbol{\widehat{\epsilon}} $$ and $$ \boldsymbol{\widehat{\epsilon}}=\boldsymbol{y}-\boldsymbol{X} \cdot \boldsymbol{\widehat{b}}=\boldsymbol{Q} \cdot \boldsymbol{y}=\left ( \boldsymbol{I}-\boldsymbol{X} \boldsymbol{(X^{'}V^{-1}X)^{-1})X^{'}V^{-1}} \right )\boldsymbol{y} $$ From these expression, it is easy to show: $$ var(\widehat{g}_j) ={\sigma{g}^{2}}^2 {\boldsymbol{Z_j}}{g}^{'} \cdot \boldsymbol{V}^{-1} \cdot \boldsymbol{Q} \cdot \boldsymbol{V} \cdot\boldsymbol{Q} \cdot \boldsymbol{V}^{-1} \cdot {\boldsymbol{Z_j}}{g} $$

We have shown that the resulting test statistic: $t_j = {\widehat{g_j} \over {\sqrt{var(\widehat{g_j})}}}$ is equivalent to a fixed SNP effect test under the model:

$$ \boldsymbol{y} = \boldsymbol{X}\cdot \boldsymbol{b}+{\boldsymbol{Z_j}}_{g} \cdot \widehat{g_j}+ \boldsymbol{\epsilon} $$

under the assumtion of known and fixed variance ratios estimated under the null model (without the SNP effect) $\lambda_i= {\sigma_{i}^{2} \over \sigma_{e}^{2}}, \lambda_g= {\sigma_{g}^{2} \over \sigma_{e}^{2}}$ This is: assuming that $\boldsymbol{V}$ is known up to a proportionality constant $\left ( \sigma_{e}^{2} \right )$, when it was estimated from the null model.

Example

We will use a toy dataset from the MSUPRP population to illustrate the use of gwaR

Load packages and data

library(gwaR)
library(regress)
#library(synbreed)
data("MSUPRP_sample")
data("Z_G")

summary(MSUPRP_sample)
dim(Z_thin)
dim(A)

Fit GBLUP model

We assume a model with fixed effects: sex and carcass weight (covariate). The only random effect is the one represented by the polygenic effect with variance-covariance matrix A

design_G<-~sex+car_wt
bl<-gblup(rsp="driploss",data=MSUPRP_sample,design=design_G,A,pos=c(T,T))
bl

GWA from GBLUP

We now perform GWA and obtain a quick summary of results. Notice we don't apply multiple test correction, but real data applications should use multiple test corrections such as bonferroni or FDR.

gw<-gwas(bl,t(Z_thin))
head(gw)
summary(gw,correction = "none")

graphics and pvalues

plot(gw,gpdata=MSUPRP_sample,plotlog10=T,q.qplot=T,pvalue=0.0001,correction="none")
logpval<-getpvalue(gw)
summary(logpval)

let's zoom into chromosomes 1 and 17

plot(gw,gpdata=MSUPRP_sample,plotlog10=T,chrom=c(1,17),pvalue=0.0001,correction="none")

simultaneous GWA and GBLUP

gwad<-run.gwa("driploss", data=MSUPRP_sample, design=design_G, A, x=t(Z_thin),returnz = F, pos=c(T,T))
head(gwad)

This functions has many options, for example, it is possible to perform LRT to test significance of h^2 before performing the GWA scan and it allows returning the vector of SNP z-scores (instead of the estimate of SNP effects and variances). It allso can save a summary of the gblup fit to the hard drive.

steibelj/gwaR documentation built on May 30, 2019, 10:45 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com