library(qqman) library(knitr) opts_chunk$set(comment=NA, fig.width=12, fig.height=9, message=FALSE, tidy=TRUE, dpi=75)
# This code used to generate the test data. Runs slow, but does the job. chrstats <- data.frame(chr=1:22, nsnps=1500) chrstats$nsnps <- with(chrstats, round(nsnps/chr^(1/3))) chrstats d <- data.frame(SNP=rep(NA, sum(chrstats$nsnps)), CHR=rep(NA, sum(chrstats$nsnps)), BP=rep(NA, sum(chrstats$nsnps)), P=rep(NA, sum(chrstats$nsnps))) snpi <- 1 set.seed(42) for (i in chrstats$chr) { for (j in 1:chrstats[i, 2]) { d[snpi, ]$SNP=paste0("rs", snpi) d[snpi, ]$CHR=i d[snpi, ]$BP=j d[snpi, ]$P=runif(1) snpi <- snpi+1 } } divisor <- c(seq(2,50,2), seq(50,2,-2)) divisor <- divisor^4 length(divisor) d[3026:3075, ]$P <- d[3026:3075, ]$P/divisor snpsOfInterest <- paste0("rs", 3001:3100) qq(d$P) manhattan(d, highlight=snpsOfInterest) gwasResults <- d save(gwasResults, file="data/gwasResults.RData")
The qqman package includes functions for creating manhattan plots and q-q plots from GWAS results. The gwasResults
data.frame included with the package has simulated results for 16,470 SNPs on 22 chromosomes. Take a look at the data:
str(gwasResults) head(gwasResults) tail(gwasResults)
How many SNPs on each chromosome?
as.data.frame(table(gwasResults$CHR))
Now, let's make a basic manhattan plot.
manhattan(gwasResults)
We can also pass in other graphical parameters. Let's add a title (main=
), increase the y-axis limit (ylim=
), reduce the point size to 60% (cex=
), and reduce the font size of the axis labels to 90% (cex.axis=
). While we're at it, let's change the colors (col=
), remove the suggestive and genome-wide significance lines, and supply our own labels for the chromosomes:
manhattan(gwasResults, main="Manhattan Plot", ylim=c(0,10), cex=0.6, cex.axis=0.9, col=c("blue4", "orange3"), suggestiveline=F, genomewideline=F, chrlabs=c(1:20, "P", "Q"))
Now, let's look at a single chromosome:
manhattan(subset(gwasResults, CHR==1))
Let's highlight some SNPs of interest on chromosome 3. The 100 SNPs we're highlighting here are in a character vector called snpsOfInterest
. You'll get a warning if you try to highlight SNPs that don't exist.
str(snpsOfInterest) manhattan(gwasResults, highlight=snpsOfInterest)
We can combine highlighting and limiting to a single chromosome, and use the xlim
graphical parameter to zoom in on a region of interest (between position 200-500):
manhattan(subset(gwasResults, CHR==3), highlight=snpsOfInterest, xlim=c(200, 500), main="Chr 3")
We can also annotate SNPs based on their p-value. By default, this only annotates the top SNP per chromosome that exceeds the annotatePval
threshold.
manhattan(gwasResults, annotatePval=0.01)
We can also annotate all SNPs that meet a threshold:
manhattan(gwasResults, annotatePval=0.005, annotateTop=FALSE)
Finally, the manhattan
function can be used to plot any value, not just p-values. Here, we'll simply call the function passing to the p=
argument the name of the column we want to plot instead of the default "P" column. In this example, let's create a test statistic ("zscore"), plot that instead of p-values, change the y-axis label, and remove the default log transformation. We'll also remove the genomewide and suggestive lines because these are only meaningful if you're plotting -log10(p-values).
# Add test statistics gwasResults <- transform(gwasResults, zscore=qnorm(P/2, lower.tail=FALSE)) head(gwasResults) # Make the new plot manhattan(gwasResults, p="zscore", logp=FALSE, ylab="Z-score", genomewideline=FALSE, suggestiveline=FALSE, main="Manhattan plot of Z-scores")
A few notes on creating manhattan plots:
str(gwasResults)
. Notice that the gwasResults
data.frame has SNP, chromosome, position, and p-value columns named SNP
, CHR
, BP
, and P
. If you're creating a manhattan plot and your column names are different, you'll have to pass the column names to the chr=
, bp=
, p=
, and snp=
arguments. See help(manhattan)
for details.fix(manhattan)
) to change the line designating the axis tick labels (labs <- unique(d$CHR)
) to set this to whatever you'd like it to be.col="blue"
, col="red"
, or col="green3"
to modify the suggestive line, genomewide line, and highlight colors, respectively.Creating Q-Q plots is straightforward - simply supply a vector of p-values to the qq()
function.
qq(gwasResults$P)
We can optionally supply many other graphical parameters.
qq(gwasResults$P, main="Q-Q plot of GWAS p-values", xlim=c(0,7), ylim=c(0,12), pch=18, col="blue4", cex=1.5, las=1)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.