A common problem in R is labelling scatter plots with large numbers of points and/or labels. We provide a utility for easy labelling of scatter plots and quick plotting of volcano plots and MA plots for gene expression analyses. Using an interactive shiny and plotly interface, users can hover over points to see where specific points are located and click on points to easily label them. Labels can be toggled on/off simply by clicking. An input box and batch input window provides an easy way to label points by name. Labels can be dragged around the plot to place them optimally. Notably we provide an easy way to export directly to PDF for publication.
knitr::opts_chunk$set( collapse = TRUE ) library(easylabel)
Install from CRAN
install.packages("easylabel") library(easylabel)
Install from Github
devtools::install_github("myles-lewis/easylabel") library(easylabel)
If you wish to use the optional useQ
argument with easyVolcano()
and
easyMAplot()
, you will need to install additional package qvalue
from
Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("qvalue")
If you wish to use the optional fullGeneNames
argument, you will need to
install packages AnnotationDbi
and org.Hs.eg.db
from Bioconductor:
BiocManager::install("AnnotationDbi") BiocManager::install("org.Hs.eg.db")
Use easylabel()
to open a shiny app and plot and label scatter plots. A table of
the main data is supplied in the Table tab for easy viewing of data.
data(mtcars) easylabel(mtcars, x = 'mpg', y = 'wt', colScheme = 'royalblue')
knitr::include_graphics("scatter1.png")
The output_shiny
option enables users to switch between invoking the shiny
app or directly outputting a plotly object. For example, to extract a plotly
figure with draggable annotations:
data(mtcars) p1 <- easylabel(mtcars, x = 'mpg', y = 'wt', col = 'cyl', startLabels = rownames(mtcars)[mtcars$gear == 5], output_shiny = FALSE) %>% layout(yaxis = list(zeroline = FALSE)) p2 <- easylabel(mtcars, x = 'mpg', y = 'drat', col = 'vs', colScheme = c("dodgerblue", "orange"), startLabels = rownames(mtcars)[mtcars$gear == 5], output_shiny = FALSE) %>% layout(xaxis = list(zeroline = FALSE)) plotly::subplot(p1, p2, nrows = 2, shareY = TRUE, titleX = TRUE, margin = 0.05)
knitr::include_graphics("plotly_output.png")
Note that while the labels and lines can still be dragged and moved in the exported plotly object, clicking on points to add them, exporting to PDF and borders around label text, which are all shiny/ PDF output features, are not available from a pure plotly object.
Plotly objects can also be exported as a return value by pressing the 'Export plotly & exit' button in the shiny app. In this case the object retains the labelled points and their edited positions.
Similar to ggplot2 and plotly, colour can either be set as a single colour,
using colScheme = 'blue'
or can be set to change with the level of a factor
variable within data
by setting col
. Transparency can be altered by setting
alpha
. Currently colour gradients are not supported.
easylabel(mtcars, x = 'mpg', y = 'wt', col = 'cyl')
Marker shapes can either be set as a single shape using shapeScheme = 21
,
based on the base graphics pch
symbol scheme (see points()
), or shapes can
be set to change with the level of a factor variable within data
by setting
shape
. Shapes and colours can be combined.
# gapminder data set if(!require(gapminder)) {install.packages("gapminder")} library(gapminder) easylabel(gapminder[gapminder$year == 2007, ], x = 'gdpPercap', y = 'lifeExp', col = 'continent', shape = 'continent', size = 10, labs = 'country', zeroline = FALSE)
knitr::include_graphics("scatter3.png")
The size of all markers can be adjusted by setting size
to a single number
e.g. size = 6
(default 8). Size can be assigned to a column of continuous data
within data
to produce a bubble chart. The size of points is automatically
scaled to fit within a range of point sizes, determined by scaleRange
.
library(gapminder) easylabel(gapminder[gapminder$year == 2007, ], x = 'gdpPercap', y = 'lifeExp', col = 'continent', labs = 'country', size = 'pop', alpha = 0.6, zeroline = FALSE)
knitr::include_graphics("bubble.png")
By default, easylabel()
takes the rownames of the main dataframe as the label
names. Any column can be selected for label names by setting labs
to the name
of a column within data
, as for example in the Gapminder dataset plots above.
Axis titles can be set using xlab
and ylab
. These accept R expressions to
allow maths symbols. This is primarily designed to work when exporting plots to
PDF via base graphics, since plotly does not natively understand expressions.
easylabel(xymatrix, x = 'x', y = 'y', col = 'col', colScheme = c('darkgrey', 'green3', 'gold3', 'blue'), xlab = expression("log"[2] ~ " fold change post-Rituximab"), ylab = expression("log"[2] ~ " fold change post-Tocilizumab"), showgrid = TRUE)
knitr::include_graphics("plot1.png")
xlim
and ylim
allow control over the range of each axis. Outlying points are
shown as diamonds. Outliers can be toggled off using
showOutliers = FALSE
. Outlier symbol, colour and line width can be set using
outlier_shape
, outlier_col
and outlier_lwd
respectively.
Axis tick marks can be controlled by setting xaxp
or yaxp
to a vector of
form c(x1, x2, n)
where x1
and x2
are the limits of the tick marks and
n
is the number of intervals between tick marks.
For full customisation of axis tick coordinates and labels use xticks
or
yticks
, which are specified as a list of two named vectors at = ...
and
labels = ...
.
# example axis ticks if(!require(gapminder)) {install.packages("gapminder")} library(gapminder) easylabel(gapminder[gapminder$year == 2007, ], x = 'gdpPercap', y = 'lifeExp', col = 'continent', shape = 'continent', size = 10, labs = 'country', zeroline = FALSE, xaxp = c(0, 50000, 10), yaxp = c(40, 85, 9), showgrid = TRUE)
Gridlines can be shown by setting showgrid = TRUE
. If showgrid
is set to
"x"
or "y"
then gridlines are shown only for that axis. Density of gridlines
can be controlled by setting xaxp
or yaxp
(see [Axis tick marks]).
Black lines through the origin can be controlled by setting zeroline
to
TRUE
or FALSE
.
Dashed grey horizontal lines can be added at specific levels by setting hline
with a vector of values. Similarly dashed grey vertical lines can be added by
setting vline
with a vector of values.
Setting showLegend
to TRUE
or FALSE
can be used to show or hide the
legend.
Legends can be positioned by setting legendxy
. This gives a vector of
coordinates for the position of the legend in plotly paper reference with
c(0, 0)
being the bottom left corner and c(1, 1)
being the top right corner
of the plot window.
Plotly has unusual behaviour in that the x coordinate always aligns the left
side of the legend. However, the y coordinate aligns the top, middle or
bottom of the legend dependent on whether it is in the top, middle or bottom
1/3 of the plot window. So c(1, 0)
positions the legend in the bottom right
corner outside the right margin of the plot, while c(1, 0.5)
centre aligns
the legend around the centre of y axis.
Further control of plotting can be achieved by passing other arguments to
plot() via R's ...
argument. For example, a box around the plot can be added
using bty = 'o'
(see par()
).
Base graphics' panel.last
argument is a flexible way to add plotting commands.
This provides a way to add trend lines, loess lines or any additional features
such as extra legends to the plot after the points are plotted, but before
labels are drawn. Note that panel.last
only works when saving to PDF, and is
not available in plotly.
# example adding a trend line using panel.last fit <- lm(xymatrix$y ~ xymatrix$x) easylabel(xymatrix, x = 'x', y = 'y', col = 'col', colScheme = c('darkgrey', 'green3', 'gold3', 'blue'), xlab = expression("log"[2] ~ " fold change post-Rituximab"), ylab = expression("log"[2] ~ " fold change post-Tocilizumab"), showgrid = TRUE, fullGeneNames = TRUE, panel.last = { abline(fit, col='red') })
This argument is used to reduce the number of scatter points displayed by
plotly, to prevent plotly becoming sluggish or unusable, which happens above
about 200,000 points. To plot large datasets such as genomic data, we recommend
adding a logical column to your dataset which filters out points of interest
for inspecting and labelling via plotly. plotly_filter
specifies the name of
this column. Saving to PDF retains the full dataset and all points will be
plotted. This argument is used by the easyManhattan()
function (see
[Manhattan plot]).
Use the easyVolcano()
function to quickly plot a volcano plot from DESeq2 or
EdgeR objects. The useQ
argument will switch to using q values for FDR.
# Example DESeq2 object head(volc1) # Example limma object head(volc2)
# Typical DESeq2 workflow volc1 <- results(dds) easyVolcano(volc1, useQ = TRUE)
If you want to specify your own columns within the dataset set x
to the column
containing log fold change, y
to the column containing raw p values and padj
to the column containing adjusted p values.
labs
can be used to specify the column containing label names, otherwise this
defaults to rownames(data)
.
# Manually specify columns easyVolcano(volc1, x = 'log2FoldChange', y = 'pvalue', padj = 'padj') # To use nominal unadjusted p value for significant genes easyVolcano(volc1, x = 'log2FoldChange', y = 'pvalue')
Note that if you specify y
for a column of p values but leave padj
blank,
the function assumes you are using nominal unadjusted p values for the cutoff
for significance.
Use the easyMAplot()
function to quickly plot an MA plot from DESeq2 or EdgeR
objects.
easyMAplot(volc2, useQ = TRUE)
knitr::include_graphics("MAplot1.png")
The fullGeneNames
argument will use Bioconductor package AnnotationDbi
and
the org.Hs.eg.db
human gene database to expand gene symbols in the Table tab.
Both will need to be installed from Bioconductor.
BiocManager::install("AnnotationDbi") BiocManager::install("org.Hs.eg.db") easyVolcano(volc1, useQ = TRUE, fullGeneNames = TRUE)
knitr::include_graphics("table1.png")
Full gene names are also shown when hovering over points in the scatter plot,
which can make it easier to label points of interest. For mouse genes, install
org.Mm.eg.db
and set AnnotationDb = org.Mm.eg.db
(note the lack of quotes).
BiocManager::install("org.Mm.eg.db") library(org.Mm.eg.db) easyVolcano(volc1, fullGeneNames = TRUE, AnnotationDb = org.Mm.eg.db)
For volcano plots a simple colour scheme with downregulated genes in blue and
upregulated genes in red can be rendered by setting fccut = 0
.
easyVolcano(volc2, useQ = TRUE, fccut = 0, main = "Volcano title")
knitr::include_graphics("plot7.png")
A title can be added using main = "Title"
. The font size of the title can be
modified using cex.main = 2
(default 1.2) if saving to PDF.
You can add left and right sided titles using Ltitle
and Rtitle
to explain
the direction of effect for up/downregulation. The use of expression
in the
example below shows how to add left/right arrow symbols to the titles. The
symbols only appear in the saved PDF - they are not compatible with plotly.
LRtitle_side = 1
puts these titles on the bottom while LRtitle_side = 3
puts
them on the top.
cex.lab
controls font size for these titles as well as axis titles.
cex.axis
controls font size for axis numbering.
easyVolcano(volc1, useQ = TRUE, fullGeneNames = TRUE, Ltitle = expression(symbol("\254") ~ "Non-responder"), Rtitle = expression("Responder" ~ symbol("\256")), LRtitle_side = 1, cex.lab = 0.9, cex.axis = 0.8, fccut = c(1, 2), fdrcutoff = 0.2, ylim = c(0, 6), xlim = c(-5, 5), colScheme = c('darkgrey', 'blue', 'orange', 'red'))
knitr::include_graphics("plot3.png")
The FDR cutoff is set using fdrcutoff
(volcano plots allow a single value,
while MA plots allow multiple values). In order to force significant
genes to be shown based on unadjusted P values, this can be achieved by a
workaround setting both y
and padj
manually to the unadjusted p value
column. Note this does mean the legend incorrectly states 'FDR < ' etc.
easyVolcano(volc1, y = 'pvalue', padj = 'pvalue', fdrcutoff = 0.01)
xlim
and ylim
allow control over the range of each axis. Outlying points are
shown as diamonds (see example above). Outliers can be toggled off using
showOutliers = FALSE
. Outlier symbol, colour and line width can be set using
outlier_shape
, outlier_col
and outlier_lwd
respectively.
The colour scheme system has been expanded to allow multiple fold change cut-offs. In the example above the colours are symmetrical.
In the next 2 plots, the colours range from blue for downregulated genes,
through to red for upregulated genes. Vertical lines can be added using vline
.
colScheme <- c('darkgrey', 'blue', 'lightblue', 'orange', 'red') easyVolcano(volc1, fccut = 1, fdrcutoff = 0.2, ylim = c(0, 6), xlim = c(-5, 5), colScheme = colScheme, vline = c(-1, 1))
knitr::include_graphics("plot4.png")
The next example has 6 colours and also shows how to remove the white outlines
around points using outline_col = NA
and use transparency instead via alpha
.
library(RColorBrewer) colScheme <- c('darkgrey', brewer.pal(9, 'RdYlBu')[c(9:7, 3:1)]) easyVolcano(volc1, fccut = c(1, 2), fdrcutoff = 0.2, ylim = c(0, 6), xlim = c(-5, 5), colScheme = colScheme, alpha = 0.75, outline_col = NA)
knitr::include_graphics("plot5.png")
Similarly 6 colours can be applied to MA plots using 3 levels of cut-off for FDR (note the colour scheme is in a slightly different order).
colScheme <- c('darkgrey', brewer.pal(9, 'RdYlBu')[c(7:9, 3:1)]) easyMAplot(volc2, fdrcutoff = c(0.05, 0.01, 0.001), size = 6, useQ = TRUE, alpha = 0.75, outline_col = NA, colScheme = colScheme)
knitr::include_graphics("MAplot2.png")
Label lines can be altered using the argument labelDir
or by selecting the
Label direction pulldown menu in the shiny app. A couple of examples are shown
below:
easyVolcano(volc1, labelDir = "horiz") easyMAplot(volc1, labelDir = "vert")
knitr::include_graphics(c("labdir_horiz.png", "labdir_vert.png"))
Label line colour can be set using line_col
. Label text colour can be set
using text_col
. Each of these can be set to match the colour of each point by
setting each to "match"
. Default label line length can be altered by changing
lineLength
(default 75 in pixels). cex.text
can be changed to alter the
font size of labels (default 0.72).
Rectangles can be drawn around the label text, using rectangles = TRUE
, and
will appear when saving to PDF (they are not supported in plotly). Rectangle
fill colour can be set using rect_col
and rectangle border colour can be set
using border_col
. Use border_col = NA
to turn off rectangle borders. The
amount of padding in pixels around label text can be specified by setting
padding
. The amount of roundedness in pixels of rectangle corners is specified
by border_radius
.
# Simple outlines easyVolcano(volc2, useQ = TRUE, fccut = 0, rectangles = TRUE) # Red outlined labels, rounded ends easyVolcano(volc2, useQ = TRUE, fullGeneNames = TRUE, rectangles = TRUE, padding = 5, border_radius = 10, line_col = 'red', border_col = 'red', text_col = 'red')
knitr::include_graphics("rect_red_outline.png")
# Transparent grey rectangles, rounded ends easyMAplot(volc2, fdrcut = c(0.05, 0.01, 0.001), size = 6, useQ = TRUE, alpha = 0.75, outline_col = NA, fullGeneNames = TRUE, colScheme = c('darkgrey', brewer.pal(9, 'RdYlBu')[c(7:9, 3:1)]), rectangles = TRUE, border_col = NA, padding = 5, rect_col = adjustcolor('grey', alpha.f = 0.6), border_radius = 20) # White text on black background, no rounding easyVolcano(volc2, useQ = TRUE, fullGeneNames = TRUE, fccut = 0, rectangles = TRUE, padding = 4, border_radius = 0, rect_col = 'black', text_col = 'white', border_col = NA)
knitr::include_graphics("rect_invert.png")
While label text and label lines can be adjusted by setting text_col
and
line_col
to individual colours, label text and label lines can also be set to
mirror the colour of each point by setting text_col = "match"
and
line_col = "match"
respectively. Label box fill colour or rectangle outline
colour can also be set to match each point's colour by setting
rect_col = "match"
or border_col = "match"
(label boxes are not supported in
plotly).
# Label text and label lines match point colours easylabel(gapminder[gapminder$year == 2007, ], x = 'gdpPercap', y = 'lifeExp', col = 'continent', labs = 'country', size = 'pop', alpha = 0.6, line_col = "match", text_col = "match", zeroline = FALSE, showgrid = "y")
knitr::include_graphics("match1.png")
# Rectangle fill colour and label line match point colours, rounded rectangles easylabel(gapminder[gapminder$year == 2007, ], x = 'gdpPercap', y = 'lifeExp', col = 'continent', labs = 'country', size = 'pop', alpha = 0.6, line_col = "match", text_col = "white", rectangles = TRUE, border_col = NA, rect_col = "match", border_radius = 20, padding = 5, zeroline = FALSE, showgrid = "y")
knitr::include_graphics("match2.png")
Manhattan plots can be labelled using the function easyManhattan()
. Plotly
struggles with more than about 100,000 points, so we use a filtering system to
reduce the number of points shown in the plotly interactive plot, while the full
plot with all points is produced when saving to PDF. An example is shown below:
# Manhattan plot using SLE GWAS data from Bentham et al 2015 # FTP download full summary statistics from # https://www.ebi.ac.uk/gwas/studies/GCST003156 library(data.table) SLE_gwas <- fread('../bentham_2015_26502338_sle_efo0002690_1_gwas.sumstats.tsv') # Simple Manhattan plot easyManhattan(SLE_gwas) # 4 colours for chromosomes easyManhattan(SLE_gwas, chromCols = RColorBrewer::brewer.pal(4, 'Paired'))
knitr::include_graphics("manhattan.png")
By default Manhattan plots display only the top 100,000 points by p value in
the plotly window and plot the top 1 million SNPs when saving to PDF. These
defaults can be altered by setting npoints
and nplotly
. Setting npoints
to NA
will plot all points when saving to PDF.
# Examples # 12 colours for chromosomes, no separate colour for significant points easyManhattan(SLE_gwas, chromCols = RColorBrewer::brewer.pal(12, 'Paired'), sigCol = NA) # Label peaks automatically, add horizontal gridlines easyManhattan(SLE_gwas, npeaks = 20, showgrid = "y") # Vertical version easyManhattan(SLE_gwas, transpose = TRUE, height = 1000, width = 600) # Chr1 only easyManhattan(SLE_gwas[SLE_gwas$chrom == 1, ]) # Add symbols for the significant SNPs easyManhattan(SLE_gwas, chromCols = RColorBrewer::brewer.pal(4, 'Paired'), size = 8, shape = 'col', shapeScheme = c(rep(20, 4), 18))
To look in a specific genomic region
# Create a locus plot over one chromosomal region library(plotly) p1 = easyManhattan(SLE_gwas[SLE_gwas$chrom == 6 & SLE_gwas$pos >= 28e6 & SLE_gwas$pos <= 34e6, ], output_shiny = FALSE, labs = "rsid", startLabels=c("rs115466242", "rs2853999"), npeaks = 3) # To annotate genes in that region source("https://raw.githubusercontent.com/KatrionaGoldmann/BioOutputs/master/R/bio_gene_locations.R") library(ggbio) library(gginnards) library(ggrepel) if (! "EnsDb.Hsapiens.v75" %in% rownames(installed.packages())) BiocManager::install("EnsDb.Hsapiens.v75") p2 = bio_gene_locations(6, c(28e6, 34e6), subset_genes = c('HLA-F', 'HLA-G', 'HLA-A', 'HLA-E', 'HLA-C', 'HLA-B', 'HLA-DRA', 'HLA-DRB5', 'HLA-DRB1', 'HLA-DQA1', 'HLA-DQB1', 'HLA-DQA2', 'HLA-DQB2', 'HLA-DOB', 'HLA-DMB', 'HLA-DMA', 'HLA-DOA', 'HLA-DPA1', 'HLA-DPB1')) plotly::subplot(p1, p2$plotly_location %>% layout(yaxis=list(range=c(0.25, 2))), shareY = T, titleX = T, margin=0.05, nrows=2, heights=c(0.7, 0.3))
knitr::include_graphics("locus.png")
scattergl
function, which uses webGL, seems to have a limit of
around 400,000 scatter points per plot. Above this level, the shiny/plotly
app becomes sluggish or freezes and becomes unusable. We recommend limiting
plots to around 100,000 points using the plotly_filter
argument. See
[plotly_filter].Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.