knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In genome-wide association studies (GWAS), the conventional single-trait analysis can lose statistical power given multiple correlated traits. Because it ignores the correlation among traits and the multiple testing correction is conservative. In contrast, multi-trait analysis methods can improve the power by exploiting the correlation among traits. Despite many existing multi-trait analysis methods using GWAS summary statistics, there are still several limitations. First, some methods cannot control type 1 error well at small significance levels. Second, many methods have inconsistent power performance under various scenarios. Third, methods relying on permutations can be time-consuming when extremely small p-values are required.
MTAFS is an efficient and robust method for multi-trait analysis of GWAS [@deng2022adaptive]. It uses GWAS summary statistics and construct its test statistic adaptively. Compared with many existing methods, MTAFS has three compelling features making it a useful method in practice. First, MTAFS can control type 1 error well even when there are hundreds of traits and significance levels are small. Second, MTAFS has robust power performance given various settings. Third, MTAFS is an efficient method by avoiding permutations.
This vignette provides an introduction to the multi-trait analysis of GWAS summary statistics using the 'MTAFS' package. R package 'MTAFS' implements the MTAFS method, an efficient and robust method for multi-trait analysis of GWAS. Users can find the most up-to-date versions of ‘MTAFS’ package in our GitHub webpage (https://github.com/Qiaolan/MTAFS).
The package can be installed and loaded with the command:
#remotes::install_github("Qiaolan/MTAFS") #devtools::load_all("~/rPackage/MTAFS") library(MTAFS)
In this package, we provide the following main functions:
MTAFS
.We illustrate the usage of these functions in the following sections.
Suppose there are $N$ SNPs and $T$ traits.
MTAFS
MTAFS
is the function that applys the MTAFS method and produces a matrix of p-values, described in detail below. It requires z scores from GWAS summary statistics and eigendecomposition results on $\hat{\mathbf R}$, output of the eigenDecomp
function. The output consists of the p-values of using a specific number of eigenvalues and MTAFS.
The following show all the parameters in MTAFS
function:
MTAFS(Z, eigR, weights = NULL, snps = NULL, traits = NULL)
Z
is a $N \times T$ matrix of z scores obtained from GWAS summary statistics, where its row and column correspond to SNP and trait, respectively. eigR
is the output of eigenDecomp
function with two options: the default setting and the user-defined setting, described in detail in the next section. weights
is a vector of weights for the Cauchy combination [@liu2020cauchy]. The defaul is NULL
assuming a uniform weight. Note that the type 1 error control and power performance of non-uniform weights were not evaluated by either the original paper of Cauchy combination or our paper. snps
is a vector of indices of SNPs of interest and we include all SNPs in Z
by default. traits
is a vector of indices of traits of interest and we include all traits in Z
by default.
The output of MTAFS
is a $N \times (L+1)$ matrix of p-values, where $L$ denotes the number of levels of variance explained. The first $L$ columns show the p-values obtained by using a specific number of eigenvalues (the percentage of variance explained is in the bracket) and the last column shows the p-values of MTAFS which is obtained by the Cauchy combination of the p-values of the first $L$ columns.
Here is an example of a partial z score matrix from an example dataset with five traits:
data("gwasArea") head(gwasArea[,1:5])
eigenDecomp
eigenDecomp
conducts eigendecomposition on $\hat{\mathbf R}$. It calculates the levels of variance explained required by MTAFS [@deng2022adaptive]. In the default setting, we denote the proportion of variance explained by the first two eigenvalues as $v_0\%$. Then let $v_1\%$, $v_2\%$, and $v_3\%$ be the three percentages evenly distributed between $v_0\%$ and $100\%$, with $q_1$, $q_2$, and $q_3$ denoting the corresponding number of eigenvalues achieving the percent of variance explained for the first time. As a result, there are five levels of variance explained, with the corresponding number of eigenvalues being $2, q_1,q_2,q_3,T$.
The following show all the parameters in eigenDecomp
function:
eigenDecomp(estR, varExp=NULL)
estR
is the estimated variance-covariance matrix $\hat{\mathbf R}$. Linkage disequilibrium score regression [@bulik2015atlas] or sample covariance [@zhu2015meta] can be used to get $\hat{\mathbf R}$.
varExp
is a vector of user-defined percentage of variance explained. The default is NULL, which implements the default setting described above.
The output of eigenDecomp
is a list containing the following components: q
the number of eigenvectors that are included in the MTAFS analysis; v
the corresponding percentage of variance explained; eigenD
a list containing all eigenvalues and eigenvectors.
In this vignette, we use an example GWAS summary statistics dataset consisting of $300$ SNPs and $212$ traits for the illustration purpose. The example dataset is a subset of the real dataset of 212 Area image-derived phenotypes in our paper [@deng2022adaptive].
data("gwasArea") dim(gwasArea)
Under the null hypothesis, we assume $\mathbf Z \sim N(\mathbf 0, \mathbf R)$. For illustration, we use sample covariance to estimate $\mathbf R$, denoted as Rhat
:
Z <- gwasArea Rhat <- cov(Z)
We show its correlation heatmap:
colnames(Rhat) <- c("1",rep("",99),"100",rep("",99),"200",rep("",10),"212") rownames(Rhat) <- c("1",rep("",99),"100",rep("",99),"200",rep("",10),"212")
corrplot::corrplot(cov2cor(Rhat), method = "color", tl.pos='lt',tl.cex = 0.8, tl.col = "black", mar = c(1,1,1,1), title = "")
We conduct eigendecompositions on $\hat {\mathbf R}$ using eigenDecomp
.
eigR <- eigenDecomp(Rhat) str(eigR)
According to the output, we include the first 2, 7, 19, 44 and 212 eigenvectors to construct five levels. Also, the five numbers of eigenvectors explain $18.5\%$, $38.9\%$, $59.3\%$, $79.6\%$, and $100 \%$ of total variance. eigR$eigenD$values
include all the eigenvalues and eigR$eigenD$vectors
include all the eigenvectors.
Suppose we want to include the numbers of eigenvectors which explain $40\%$, $60\%$, and $80\%$ of the variance.
eigR_user <- eigenDecomp(Rhat, varExp = c(0.4,0.6,0.8)) str(eigR_user)
According to the output, we include the first $8$, $20$, and $45$ eigenvectors to construct three levels. Also, the three numbers of eigenvectors explain $40\%$, $60\%$, and $80 \%$ of total variance. eigR$eigenD$values
include all the eigenvalues and eigR$eigenD$vectors
include all the eigenvectors.
Based on the default five levels, we construct the test statistic of MTAFS and calculate the final p-values for each SNPs by MTAFS
:
time=proc.time() pMTAFS <- MTAFS(Z, eigR) proc.time()-time
It takes r (proc.time()-time)[3]
seconds to analyze 300 SNPs and 212 traits.
head(pMTAFS)
The output of MTAFS
is a matrix of p-values, where its row corresponds to SNP. The first five columns consists of p-values obtained by using specific numbers (2, 7, 19, 44, and 212) of eigenvalues (inside brackets are the corresponding percentage of variance explained) and the last column are the p-values of MTAFS, based on which we can make a conclusion about the associations of interest.
We can also use the three levels in the user-defined setting.
pMTAFS_user <- MTAFS(Z, eigR_user)
head(pMTAFS_user)
Since the user-defined setting has only three levels, now the output p-value matrix has only four columns in total. The first three columns consists of p-values obtained by using specific numbers (8, 20, and 45) of eigenvalues and the last column are the p-values of MTAFS, based on which we can make a conclusion about the associations of interest.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.