Home

/

GitHub

/

In Qiaolan/MTAFS: An Adaptive and Robust Method for Multi-trait Analysis of Genome-wide Association Studies Using Summary Statistics

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Overview

In genome-wide association studies (GWAS), the conventional single-trait analysis can lose statistical power given multiple correlated traits. Because it ignores the correlation among traits and the multiple testing correction is conservative. In contrast, multi-trait analysis methods can improve the power by exploiting the correlation among traits. Despite many existing multi-trait analysis methods using GWAS summary statistics, there are still several limitations. First, some methods cannot control type 1 error well at small significance levels. Second, many methods have inconsistent power performance under various scenarios. Third, methods relying on permutations can be time-consuming when extremely small p-values are required.

MTAFS is an efficient and robust method for multi-trait analysis of GWAS [@deng2022adaptive]. It uses GWAS summary statistics and construct its test statistic adaptively. Compared with many existing methods, MTAFS has three compelling features making it a useful method in practice. First, MTAFS can control type 1 error well even when there are hundreds of traits and significance levels are small. Second, MTAFS has robust power performance given various settings. Third, MTAFS is an efficient method by avoiding permutations.

This vignette provides an introduction to the multi-trait analysis of GWAS summary statistics using the 'MTAFS' package. R package 'MTAFS' implements the MTAFS method, an efficient and robust method for multi-trait analysis of GWAS. Users can find the most up-to-date versions of ‘MTAFS’ package in our GitHub webpage (https://github.com/Qiaolan/MTAFS).

The package can be installed and loaded with the command:

#remotes::install_github("Qiaolan/MTAFS")
#devtools::load_all("~/rPackage/MTAFS")
library(MTAFS)

In this package, we provide the following main functions:

MTAFS: the flagship function of the package, which requires z scores from GWAS summary statistics and returns p-values of MTAFS.
eigenDecomp: conducts eigendecomposition on the estimated variance-covariance matrix $\hat{\mathbf R}$ and the output is used by MTAFS.

We illustrate the usage of these functions in the following sections.

Introduction of Main Functions

Suppose there are $N$ SNPs and $T$ traits.

`MTAFS`

MTAFS is the function that applys the MTAFS method and produces a matrix of p-values, described in detail below. It requires z scores from GWAS summary statistics and eigendecomposition results on $\hat{\mathbf R}$, output of the eigenDecomp function. The output consists of the p-values of using a specific number of eigenvalues and MTAFS.

The following show all the parameters in MTAFS function:

MTAFS(Z, eigR, weights = NULL, snps = NULL, traits = NULL)

Z is a $N \times T$ matrix of z scores obtained from GWAS summary statistics, where its row and column correspond to SNP and trait, respectively. eigR is the output of eigenDecomp function with two options: the default setting and the user-defined setting, described in detail in the next section. weights is a vector of weights for the Cauchy combination [@liu2020cauchy]. The defaul is NULL assuming a uniform weight. Note that the type 1 error control and power performance of non-uniform weights were not evaluated by either the original paper of Cauchy combination or our paper. snps is a vector of indices of SNPs of interest and we include all SNPs in Z by default. traits is a vector of indices of traits of interest and we include all traits in Z by default.

The output of MTAFS is a $N \times (L+1)$ matrix of p-values, where $L$ denotes the number of levels of variance explained. The first $L$ columns show the p-values obtained by using a specific number of eigenvalues (the percentage of variance explained is in the bracket) and the last column shows the p-values of MTAFS which is obtained by the Cauchy combination of the p-values of the first $L$ columns.

Here is an example of a partial z score matrix from an example dataset with five traits:

data("gwasArea")
head(gwasArea[,1:5])

`eigenDecomp`

eigenDecomp conducts eigendecomposition on $\hat{\mathbf R}$. It calculates the levels of variance explained required by MTAFS [@deng2022adaptive]. In the default setting, we denote the proportion of variance explained by the first two eigenvalues as $v_0\%$. Then let $v_1\%$, $v_2\%$, and $v_3\%$ be the three percentages evenly distributed between $v_0\%$ and $100\%$, with $q_1$, $q_2$, and $q_3$ denoting the corresponding number of eigenvalues achieving the percent of variance explained for the first time. As a result, there are five levels of variance explained, with the corresponding number of eigenvalues being $2, q_1,q_2,q_3,T$.

The following show all the parameters in eigenDecomp function:

eigenDecomp(estR, varExp=NULL)

estR is the estimated variance-covariance matrix $\hat{\mathbf R}$. Linkage disequilibrium score regression [@bulik2015atlas] or sample covariance [@zhu2015meta] can be used to get $\hat{\mathbf R}$.

varExp is a vector of user-defined percentage of variance explained. The default is NULL, which implements the default setting described above.

The output of eigenDecomp is a list containing the following components: q the number of eigenvectors that are included in the MTAFS analysis; v the corresponding percentage of variance explained; eigenD a list containing all eigenvalues and eigenvectors.

An Example

In this vignette, we use an example GWAS summary statistics dataset consisting of $300$ SNPs and $212$ traits for the illustration purpose. The example dataset is a subset of the real dataset of 212 Area image-derived phenotypes in our paper [@deng2022adaptive].

data("gwasArea")
dim(gwasArea)

Estimate $\mathbf R$

Under the null hypothesis, we assume $\mathbf Z \sim N(\mathbf 0, \mathbf R)$. For illustration, we use sample covariance to estimate $\mathbf R$, denoted as Rhat:

Z <- gwasArea
Rhat <- cov(Z)

We show its correlation heatmap:

colnames(Rhat) <- c("1",rep("",99),"100",rep("",99),"200",rep("",10),"212")
rownames(Rhat) <-  c("1",rep("",99),"100",rep("",99),"200",rep("",10),"212")

corrplot::corrplot(cov2cor(Rhat), method = "color", tl.pos='lt',tl.cex = 0.8, tl.col = "black", mar = c(1,1,1,1), title = "")

Eigendecomposition

The default setting

We conduct eigendecompositions on $\hat {\mathbf R}$ using eigenDecomp.

eigR <- eigenDecomp(Rhat)
str(eigR)

According to the output, we include the first 2, 7, 19, 44 and 212 eigenvectors to construct five levels. Also, the five numbers of eigenvectors explain $18.5\%$, $38.9\%$, $59.3\%$, $79.6\%$, and $100 \%$ of total variance. eigR$eigenD$values include all the eigenvalues and eigR$eigenD$vectors include all the eigenvectors.

User-defined setting

Suppose we want to include the numbers of eigenvectors which explain $40\%$, $60\%$, and $80\%$ of the variance.

eigR_user <- eigenDecomp(Rhat, varExp = c(0.4,0.6,0.8))
str(eigR_user)

According to the output, we include the first $8$, $20$, and $45$ eigenvectors to construct three levels. Also, the three numbers of eigenvectors explain $40\%$, $60\%$, and $80 \%$ of total variance. eigR$eigenD$values include all the eigenvalues and eigR$eigenD$vectors include all the eigenvectors.

MTAFS

The default setting

Based on the default five levels, we construct the test statistic of MTAFS and calculate the final p-values for each SNPs by MTAFS:

time=proc.time()
pMTAFS <- MTAFS(Z, eigR)
proc.time()-time

It takes r (proc.time()-time)[3] seconds to analyze 300 SNPs and 212 traits.

head(pMTAFS)

The output of MTAFS is a matrix of p-values, where its row corresponds to SNP. The first five columns consists of p-values obtained by using specific numbers (2, 7, 19, 44, and 212) of eigenvalues (inside brackets are the corresponding percentage of variance explained) and the last column are the p-values of MTAFS, based on which we can make a conclusion about the associations of interest.

User-defined setting

We can also use the three levels in the user-defined setting.

pMTAFS_user <- MTAFS(Z, eigR_user)

head(pMTAFS_user)

Since the user-defined setting has only three levels, now the output p-value matrix has only four columns in total. The first three columns consists of p-values obtained by using specific numbers (8, 20, and 45) of eigenvalues and the last column are the p-values of MTAFS, based on which we can make a conclusion about the associations of interest.

Reference

Qiaolan/MTAFS documentation built on Feb. 26, 2023, 7:02 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Qiaolan/MTAFS
An Adaptive and Robust Method for Multi-trait Analysis of Genome-wide Association Studies Using Summary Statistics

In Qiaolan/MTAFS: An Adaptive and Robust Method for Multi-trait Analysis of Genome-wide Association Studies Using Summary Statistics

Overview

Introduction of Main Functions

`MTAFS`

`eigenDecomp`

An Example

Estimate $\mathbf R$

Eigendecomposition

The default setting

User-defined setting

MTAFS

The default setting

User-defined setting

Reference

R Package Documentation

Browse R Packages

We want your feedback!

Qiaolan/MTAFS An Adaptive and Robust Method for Multi-trait Analysis of Genome-wide Association Studies Using Summary Statistics

In Qiaolan/MTAFS: An Adaptive and Robust Method for Multi-trait Analysis of Genome-wide Association Studies Using Summary Statistics

Overview

Introduction of Main Functions

MTAFS

eigenDecomp

An Example

Estimate $\mathbf R$

Eigendecomposition

The default setting

User-defined setting

MTAFS

The default setting

User-defined setting

Reference

R Package Documentation

Browse R Packages

We want your feedback!

Qiaolan/MTAFS
An Adaptive and Robust Method for Multi-trait Analysis of Genome-wide Association Studies Using Summary Statistics

`MTAFS`

`eigenDecomp`