library(knitr) opts_chunk$set(fig.width = 12)
This vignette illustrates the use of the aSPU package with GWAS Summary Statistics.
We will consider the analysis of a coronary artery disease (CAD) data from the CARDIoGRAM and C4D consortium. The data set contains P value data for coronary artery disease (CAD). This study comprised 63,746 CAD cases and 130,681 controls. We mapped these SNPs to the 9th KEGG pathway (KEGG_STEROID_BIOSYNTHESIS) . Let's load this subset.
library(aSPU) data(kegg9)
The 9th Kegg pathway contains 16 genes.
kegg9$gene.info
The PPs
is a list object contains the SNP information for each genes.
## SNPs mapped on 3rd and 4th genes of 9th Kegg pathway kegg9$PPs[3:4]
The Ps
object contains p-value information for mapped
SNPs. Total r length(kegg9$Ps)
SNPs are mapped on 9th kegg pathway.
length(kegg9$Ps) kegg9$Ps[1:10]
Using plink,
we mapped our SNPs to the reference population ( Hapmap CEU phase 2
). We dropped the SNPs of minor allele frequency (MAF) less then 5
percent. Among r length(kegg9$Ps)
SNPs mapped on 9th Kegg pathway,
r length(kegg9$nP)
SNPs are mapped on the reference data. P-values of these
SNPs and correlation matirx of SNPs using the reference population,
saved on nP
and ldmatrix
object.
kegg9$nP[1:10] kegg9$ldmatrix[1:10,1:10]
The snp.info
data object have snp information for each mapped
SNPs. The 1st column is rsID, 2nd column is Chr, 3rd column is
location and 4th column is p-value.
kegg9$snp.info[1:10,]
Using the following code we can use aSPUs()
and GATES2()
function
to get gene-wise aSPU and GATES p-value for each gene in 9th Kegg pathway.
Gps<-NULL; gl <- NULL; for( g in kegg9$gene.info[,1]) { snps <- which( ( kegg9$snp.info[,2] == kegg9$gene.info[kegg9$gene.info[,1]==g,2]) & (kegg9$snp.info[,3] > kegg9$gene.info[kegg9$gene.info[,1] == g, 3])& (kegg9$snp.info[,3] < kegg9$gene.info[kegg9$gene.info[,1] == g, 4])) newP <- kegg9$nP[snps] ; ldsub <- kegg9$ldmatrix[snps, snps]; if( length(snps) > 1) { out <- aSPUs(newP, corSNP=ldsub , pow=c(1,2,4,8, Inf), n.perm=10000, Ps=TRUE) o.pvec = order(newP) ldmat <- ldsub[o.pvec, o.pvec] gatesp <- GATES2(ldmat, sort(newP))[1] Gps <- rbind(Gps, c(length(newP),out$pvs, gatesp)) gl <- c(gl, g) } else if (length(snps) == 1) { out <- newP gatesp <- newP Gps <- rbind(Gps, c(length(newP),rep(out,6), gatesp) ) gl <- c(gl, g) } } colnames(Gps)[1] <- "nSNP" rownames(Gps) <- gl Gps
The row of Gps
means each gene, 1st column indicate the number of
SNPs for each gene. 2nd to 6th column indicate SPUs p-values for each
power ( 1,2,4,8 and Inf), 7th column indicate aSPUs p-value and 8th
column indicate p-value of GATES method.
We can see that a gene LIPA is very significant.
g = "LIPA" snps <- which( ( kegg9$snp.info[,2] == kegg9$gene.info[kegg9$gene.info[,1]==g,2]) & (kegg9$snp.info[,3] > kegg9$gene.info[kegg9$gene.info[,1] == g, 3])& (kegg9$snp.info[,3] < kegg9$gene.info[kegg9$gene.info[,1] == g, 4])) newP <- kegg9$nP[snps] ; newP
LIPA have r length(newP)
SNPs mapped and we can see that there are
many significant SNPs. It makes sense that SPUs(1) have less p-value
then SPUs(inf) since there are multiple significant SNPs. GATES have
more power when there are small number of significant SNPs in the
Gene (similar to minP test), so aSPUs have less p-value than GATES.
We can get p-value for pathways as follows. Let's perform pathway based analysis using aSPUsPath, Gates-Simes and HYST.
out.g <- GatesSimes(pvec = kegg9$nP, ldmatrix = kegg9$ldmatrix, snp.info=kegg9$snp.info, gene.info = kegg9$gene.info) out.h <- Hyst(pvec = kegg9$nP, ldmatrix = kegg9$ldmatrix, snp.info=kegg9$snp.info, gene.info = kegg9$gene.info) out.a <- aSPUsPath(kegg9$nP, corSNP = kegg9$ldmatrix, pow=c(1,2,4,8, Inf), pow2 = c(1,2,4,8), snp.info=kegg9$snp.info, gene.info = kegg9$gene.info, n.perm=1000, Ps = TRUE) out.g; out.h; out.a
As we can see from the gene-wise analysis, there is only one very
significant gene LIPA
. In this situation Gates-Simes works well and
Gate-Simes p-value is r out.g
. On the other hand, HYST works well
when there are many significant genes with similar effects.
The aSPUsPath adaptively consider all SPUsPath(i,j). We can see that
the p-value decrease as our 2nd parameter , pow2
, increases. It makes
sense because larger pow2
is more effective if there are fewer
associated genes.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.