Adaptive Gene- and Pathway- Trait Association testing with GWAS Summary Statistics (aSPUs() and aSPUsPath())
In aSPU: Adaptive Sum of Powered Score Test

library(knitr)
opts_chunk$set(fig.width = 12)

This vignette illustrates the use of the aSPU package with GWAS Summary Statistics.

Data

We will consider the analysis of a coronary artery disease (CAD) data from the CARDIoGRAM and C4D consortium. The data set contains P value data for coronary artery disease (CAD). This study comprised 63,746 CAD cases and 130,681 controls. We mapped these SNPs to the 9th KEGG pathway (KEGG_STEROID_BIOSYNTHESIS) . Let's load this subset.

library(aSPU)
data(kegg9)

The 9th Kegg pathway contains 16 genes.

kegg9$gene.info

The PPs is a list object contains the SNP information for each genes.

## SNPs mapped on 3rd and 4th genes of 9th Kegg pathway
kegg9$PPs[3:4]

The Ps object contains p-value information for mapped SNPs. Total r length(kegg9$Ps)SNPs are mapped on 9th kegg pathway.

length(kegg9$Ps)
kegg9$Ps[1:10]

Using plink, we mapped our SNPs to the reference population ( Hapmap CEU phase 2 ). We dropped the SNPs of minor allele frequency (MAF) less then 5 percent. Among r length(kegg9$Ps) SNPs mapped on 9th Kegg pathway, r length(kegg9$nP) SNPs are mapped on the reference data. P-values of these SNPs and correlation matirx of SNPs using the reference population, saved on nP and ldmatrix object.

kegg9$nP[1:10]
kegg9$ldmatrix[1:10,1:10]

The snp.info data object have snp information for each mapped SNPs. The 1st column is rsID, 2nd column is Chr, 3rd column is location and 4th column is p-value.

kegg9$snp.info[1:10,]

Use of aSPUs function

Using the following code we can use aSPUs() and GATES2() function to get gene-wise aSPU and GATES p-value for each gene in 9th Kegg pathway.

Gps<-NULL;
gl <- NULL;
for( g in kegg9$gene.info[,1]) {
    snps <-
    which( ( kegg9$snp.info[,2] == kegg9$gene.info[kegg9$gene.info[,1]==g,2]) &
           (kegg9$snp.info[,3] > kegg9$gene.info[kegg9$gene.info[,1] == g, 3])&
           (kegg9$snp.info[,3] < kegg9$gene.info[kegg9$gene.info[,1] == g, 4]))

    newP <- kegg9$nP[snps] ;
    ldsub <- kegg9$ldmatrix[snps, snps];

    if( length(snps) > 1) {
        out <- aSPUs(newP, corSNP=ldsub , pow=c(1,2,4,8, Inf),
                     n.perm=10000, Ps=TRUE)

        o.pvec = order(newP)
        ldmat <- ldsub[o.pvec, o.pvec]
        gatesp <- GATES2(ldmat, sort(newP))[1]
        Gps <- rbind(Gps, c(length(newP),out$pvs, gatesp))
        gl <- c(gl, g)
    } else if (length(snps) == 1) {
        out <- newP
        gatesp <- newP
        Gps <- rbind(Gps, c(length(newP),rep(out,6), gatesp) )
        gl <- c(gl, g)
    }
}
colnames(Gps)[1] <- "nSNP"
rownames(Gps) <- gl
Gps

The row of Gps means each gene, 1st column indicate the number of SNPs for each gene. 2nd to 6th column indicate SPUs p-values for each power ( 1,2,4,8 and Inf), 7th column indicate aSPUs p-value and 8th column indicate p-value of GATES method. We can see that a gene LIPA is very significant.

g = "LIPA"
    snps <-
    which( ( kegg9$snp.info[,2] == kegg9$gene.info[kegg9$gene.info[,1]==g,2]) &
           (kegg9$snp.info[,3] > kegg9$gene.info[kegg9$gene.info[,1] == g, 3])&
           (kegg9$snp.info[,3] < kegg9$gene.info[kegg9$gene.info[,1] == g, 4]))

newP <- kegg9$nP[snps] ;
newP

LIPA have r length(newP) SNPs mapped and we can see that there are many significant SNPs. It makes sense that SPUs(1) have less p-value then SPUs(inf) since there are multiple significant SNPs. GATES have more power when there are small number of significant SNPs in the Gene (similar to minP test), so aSPUs have less p-value than GATES.

Use of aSPUsPath function

We can get p-value for pathways as follows. Let's perform pathway based analysis using aSPUsPath, Gates-Simes and HYST.

out.g <- GatesSimes(pvec = kegg9$nP, ldmatrix = kegg9$ldmatrix,
                    snp.info=kegg9$snp.info, gene.info = kegg9$gene.info)

out.h <- Hyst(pvec = kegg9$nP, ldmatrix = kegg9$ldmatrix,
              snp.info=kegg9$snp.info, gene.info = kegg9$gene.info)

out.a <- aSPUsPath(kegg9$nP, corSNP = kegg9$ldmatrix, pow=c(1,2,4,8, Inf),
                   pow2 = c(1,2,4,8),
                   snp.info=kegg9$snp.info, gene.info = kegg9$gene.info,
                   n.perm=1000, Ps = TRUE)

out.g; out.h; out.a

As we can see from the gene-wise analysis, there is only one very significant gene LIPA. In this situation Gates-Simes works well and Gate-Simes p-value is r out.g. On the other hand, HYST works well when there are many significant genes with similar effects.

The aSPUsPath adaptively consider all SPUsPath(i,j). We can see that the p-value decrease as our 2nd parameter , pow2, increases. It makes sense because larger pow2 is more effective if there are fewer associated genes.