ggman: R package to create Manhattan plot using ggplot.

Create a basic Manhattan plot

A toy GWAS dataset is made available along with the package. Let's look at the dimensions, head and tail of the dataset.

library(ggman)

dim(toy.gwas)
head(toy.gwas)
tail(toy.gwas)

To create a Manhattan plot, only the first 4 columns (chrom,snp,bp,pvalue) are required. Specific preformatting of the column classes is not required. The chromosome identifiers can be either numbers (1,2,3..) or strings("Chr1","Chr2"..).

ggman(toy.gwas, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue")

Use relative positioning

By enabling the relative positioning, the base pair positions will be scaled in proportion to the real genome positions. Hence, the gaps with no SNPs can be visualized. Be default this is not enabled. To use the relative positions, use the option relative.positions = TRUE

ggman(toy.gwas, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue", 
      relative.positions = TRUE)

Add labels

Specific set of points in the plot can be annotated by providing a data.frame with only the SNPs those need to be labelled. Let's take a subset of the main data frame toy.gwas.

#subset only the SNPs with -log10(pvalue) > 8
toy.gwas.sig <- toy.gwas[-log10(toy.gwas$pvalue)>8,]

# dimensions
dim(toy.gwas.sig)

#head 
head(toy.gwas.sig)

The main layer of Manhattan plot should be saved in a variable and provided subsequently to ggmanLabel function. The name of the columns with snps and labels has to be supplied. In this case, we will label with SNP identifiers.

## save the main layer in a variable
p1 <- ggman(toy.gwas, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue", 
            relative.positions = TRUE)

##add label
ggmanLabel(p1, labelDfm = toy.gwas.sig, snp = "snp", label = "snp")

Annotations can be just text instead of labels. Use the type= argument.

#add text
ggmanLabel(p1, labelDfm = toy.gwas.sig, snp = "snp", label = "snp", type = "text")

The R package ggrepel is used for annotations. All the arguments that are applicable to geom_text_repel and geom_label_repel can be passed on to ggmanLabel. Lets change the size and colour of the labels.

ggmanLabel(p1, labelDfm = toy.gwas.sig, snp = "snp", label = "snp", colour = "black", size = 2)

Caution: providing the whole main data frame as labelDfm will fill the entire plot with text or might crash the R if the data frame is too big

Highlight a single group of points

The function ggmanHighlight can be used to highlight a single group of points. Be default, while highlighting specific points, the main layer of Manhattan plot is greyed out. We need to supply a vector object with SNP names to highlight. The example file toy.highlights comes along with package.

class(toy.highlights)
length(toy.highlights)
head(toy.highlights)

ggmanHighlight(p1, highlight = toy.highlights)

Highlight multiple groups of points with a legend

The function ggmanHighlightGroup can be used to highlight multiple groups of points and a legend can be added. Let's look at the example file toy.highlights.group.

class(toy.highlights.group)
dim(toy.highlights.group)
head(toy.highlights.group)

Unlike ggmanHighllight, the function ggmanHighlightGroup requires data.frame as an input. One of the column names should be supplied as a grouping variable. The size of the highlighted points can be changed with size argument. The legend title can be specified with legend.title argument.

ggmanHighlightGroup(p1, highlightDfm = toy.highlights.group, snp = "snp", group = "group", 
                    size = 0.5, legend.title = "Significant groups")

It is also possible to remove the legend using legend.remove argument.

ggmanHighlightGroup(p1, highlightDfm = toy.highlights.group, snp = "snp", group = "group", 
                    size = 0.5, legend.remove = TRUE)

Add SNP clumps

In a typical genome wide association study, it is a standard practice to display SNPs in linkage disequilibrium with the index SNP as clumps. The plink software has clumping procedure, which outputs clump file with .clumped extension.

Adding clumps to Manhattan plot involves four steps.

  1. Perform clumping using Plink --clump function
  2. Read the output file in to a data.frame. Suppose the name of the output file is plink.clumped then
gwas.clump <- read.table("plink.clumped", header = TRUE)

Here, the example file toy.clumped is a data.frame, which is created by reading the plink.clumped file and subsetting only the columns 'SNP' and 'SP2'.

  1. Convert the toy.clumped data.frame to a ggclumps object using the ggmanClumps function. The arguments index.snp.column and clumps.column are mandatory. The name of the column containing index SNPs ('SNP') should be passed to argument index.snp.column and the name of the column containing the clumps should be passed to argument clumps.column.
toy.clumps <- ggmanClumps(toy.clumped, index.snp.column = "SNP", clumps.column = "SP2") 
  1. Pass the toy.clumped object to clumps= argument of ggman function.
ggman(toy.gwas,clumps = toy.clumps, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue", 
      relative.positions = TRUE, pointSize = 0.5)

highlight and label the clumps

The clumps can be grouped using a grouping variable and the index SNPs can be labelled with user prefered labels. All you need to do is to add additional columns in the plink.clumped file and specify them in the ggmanClumps function. Here in the example toy.clumped file, there are 2 extra columns with names 'group' and 'label'.

toy.clumps <- ggmanClumps(toy.clumped, index.snp.column = "SNP", clumps.column = "SP2", 
                          group.column = "group", label.column = "label") 
ggman(toy.gwas,clumps = toy.clumps, snp = "snp", bp = "bp", chrom = "chrom", 
      pvalue = "pvalue", relative.positions = TRUE, pointSize = 0.5)

Use legend.title to change the legend title. If you prefer plain text without box for labels, use clumps.label.type = 'text

ggman(toy.gwas,clumps = toy.clumps, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue", 
      relative.positions = TRUE, pointSize = 0.5, legend.title = "Groups", clumps.label.type = 'text')

Zoom in to a specific chromosome

The function ggmanZoom can be used to create regional association plot. Plotting a single chromosome is very simple.

ggmanZoom(p1, chromosome = 1)

Zoom in to a specific region of a chromosome

To plot a specific region, the starting and the ending basepair positions have to be specified. Let's zoom in to the chromosome 1 region containing genes: GENE21, GENE22 and GENE23.

ggmanZoom(p1, chromosome = 1, start.position = 215388741, end.position = 238580695)

Highlight points in the zoomed region

It's also possible to highlight using specific grouping variable. Here we have a column named gene in the main data frame toy.gwas that was used to construct the main layer p1.

ggmanZoom(p1, chromosome = 1, start.position = 215388741, end.position = 238580695, 
          highlight.group = "gene", legend.title = "Genes")

Create an inverted Manhattan plot

An inverted Manhattan plot can be created by inverting the direction of p values of variants with negative beta values (or odds ratio < 1). Set the argument invert to TRUE to get an inverted Manhattan plot. If invert=TRUE, then invert.method and invert.var should be specified. The invert.method can be either or or beta. The invert.var is the name of the column containing the beta or odds ratio according to the value passed to invert.method

ggman(toy.gwas, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue", invert = TRUE,
      invert.method = 'or', invert.var = "or")

Plot Odds ratio

ggman(toy.gwas, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "or", 
      logTransform = FALSE, ymax = 3)

Plot beta

ggman(toy.gwas, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "beta", 
      logTransform = FALSE, ymin = -2, ymax = 2)


veera-dr/ggman documentation built on May 3, 2019, 4:59 p.m.