gwas: Empirical Bayes Genome Wide Association Mapping

Description Usage Arguments Details Value Author(s) References Examples

View source: R/gwas.R


The gwas function calculates the likelihood ratio for each marker under the empirical Bayesian framework. The method allows analysis with multiple populations. gwas2 is computationally optimized. gwas3 was design for multiple random populations.





Numeric vector of observations (n) describing the trait to be analyzed. NA is allowed.


Numeric matrix containing the genotypic data. A matrix with n rows of observations and (m) columns of molecular markers. SNPs must be coded as 0, 1, 2, for founder homozygous, heterozygous and reference homozigous. NA is allowed.


Numeric vector of length n indicating a stratification factor or which subpopulation (e.g. family) that each observation comes from. Default assumes that all observations are from the same populations.


Numeric vector indicating the number of markers in each chromosome. The sum of chr must be equal to the number of columns in gen. Default assumes that all markers are from the same chromosome.


Numeric. If specified, genetic distance between markers is used for moving window strategy. Window must be specified in Morgans (e.g. 0.05 would represent 5cM). Genetic distance is calculated assuming that individuals are RILs.


Logical. If TRUE, markers are treated as fixed effect and hence, evaluated through Wald statistics. If markers are specief as fixed, the argument 'window' is not applicable.


Output of the R function 'eigen'. It is used for user-defined kinship matrix.


Numeric vector of length n to be used as covariate in the association analysis.


Numeric matrix of observations (n*e) where rows represent genotypes and columns represent environments. NA is allowed.


Logical. If TRUE, meta-analysis (function gwasGE) will be done for the GxE interactions term only. If FALSE, variance components will be computed for three terms: genotype, environment and interaction.


Integer. It indicates the number of principal components used to represent GxE interactions through additive main-effects and multiplicative interaction (AMMI).


List of objected output from gwas3 to perform meta-analysis.


Empirical Bayes model (Wang 2016) with a special incidence matrix is recreated to optimize the information provided by the subpopulations. Each locus is recoded as a vector with length f equal to number of subpopulations, or NAM families, as the interaction locus by family. For example, a locus heterozigous from an individual from subpopulation 2 is coded as [ 1, 0, 1 , ... ,f ], a locus homozigous for the reference allele from any subpopulation is coded as [ 2, 0, 0, ... , f ] and a locus homozigous for the founder allele from an individual from subpopulation 1 is coded as [ 0, 2, 0, ... ,f ]. The base model for genome scanning is described by:

y = Xb + Zu + g + e

That includes the fixed effect (Xb), the marker (Zu), the polygene (g) and the residuals (e). If the window term is specified, the model for genome scanning is expanded as follows:

y = Xb + Zu[k-1] + Zu + Zu[k+1] + g - g[k] + e

This model includes three extra terms: the left side genome ( Zu[k-1] ) and the right side genome ( Zu[k+1] ), also subtracting the window polygene ( -g[k] ). Windows are based on genetic distance, which is computed using Kosambi map function. The recombination rate is estimated under the assuption markers are ordered and that genotypes are recombinant inbred lines.

The polygenic term is calculated only once (Zhang et al 2010) using eigendecomposition with a GEMMA-like algorithm (Zhou ans Stephens 2012). Efficient inversion of capacitance matrix is obtained through the Woodbury matrix identities. Models and algorithms are described with more detail by Xavier et al (2015) and Wei and Xu (2016).

In order to analyze large dateset, one can avoid memory issues by using the function gwas2, but that the argument 'window' is not implemented for gwas2. This function also allows used-defined kindship through the argument EIG, and the use of a numeric covariate vector through the argument cov.

When multi-environmental trials are the target of mapping, one may use the function gwasGE to perform analysis by environment, followed by "meta-analysis" used to combine the results. This strategy provides an idea of the variation on QTL effect due to environment, genetic background (provided by the stratification factor) and the interaction between environment and genetics.

An alternative to this method is the mega-analysis, where one can provide the stratification factor as a combination of subpopulation and environment. Meta-analysis can be performed in a single step with function gwasGE, or users can perform multiple association analyses using gwas3 and perform meta-analysis with meta3. In gwasGE, the same genotype will often appear more than once in the phenotypic and genotypic data, so that phenotypes are provided as a matrix. The statistical detail about the meta-analysis are available in the vignette Background for Meta-analysis.

The function gwas3 is an alternative for association analysis and meta-analysis, also solved in the Empirical-Bayes framework for multiple populations. Unlike gwas, gwas2 and gwasGE, this function does not set a reference allele and analysis each marker as the interaction of allele by stratification factor (ie. family or subpopulation). Therefore, gwas3 is compatible with any allele coding.

For further statistical background:

1) system(paste('open',system.file("doc","gwa_description.pdf",package="NAM")))

2) system(paste('open',system.file("doc","gwa_ge_interactions.pdf",package="NAM")))


The function gwas returns a list containing the method deployed (Method), a summary of predicted parameters and statistical tests (PolyTest), estimated genetic map for NAM panels (MAP) and the marker names (SNPs).


Alencar Xavier, Tiago Pimenta, Qishan Wang and Shizhong Xu


Wang, Q., Wei, J., Pan, Y., & Xu, S. (2016). An efficient empirical Bayes method for genomewide association studies. Journal of Animal Breeding and Genetics, 133(4), 253-263.

Wei, J., & Xu, S. (2016). A Random Model Approach to QTL Mapping in Multi-parent Advanced Generation Inter-cross (MAGIC) Populations. Genetics, 202(2), 471-486.

Xavier, A., Xu, S., Muir, W. M., & Rainey, K. M. (2015). NAM: Association Studies in Multiple Populations. Bioinformatics, 31(23), 3862-3864.

Zhang et al. 2010. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42:355-360.

Zhou, X., & Stephens, M. (2012). Genome-wide efficient mixed-model analysis for association studies. Nature genetics, 44(7), 821-824.


## Not run: 

## End(Not run)

alenxav/NAM documentation built on Jan. 8, 2020, 9:21 p.m.