library(knitr) opts_chunk$set(fig.align = 'center', fig.show = 'hold', fig.width = 7, fig.height = 4) options(warnPartialMatchArgs = FALSE, tibble.print.max = 4, tibble.print.min = 4, dplyr.summarise.inform = FALSE)
The very popular R package 'ggrepel' does a great job at avoiding overlaps among data labels and between them and observations plotted as points. A difficulty that stems from the use of an algorithm based on random displacements is that the final location of the data labels can become more disordered than necessary. In addition when including smooth regression lines the data labels may partly occlude the fitted line and/or the confidence band.
Package 'ggpp' defines new position functions that save the starting position
like position_nudge_repel()
does but come in multiple flavors. Their use
together with repulsive geometries from 'ggrepel' makes it possible to give to
the data labels an initial "push" in a non-random direction. This helps a lot,
much more than what I expect initially, in obtaining more orderly
displacement of the data labels away from a cloud of observations or line.
Another problem sometimes encountered when using position functions is that combinations of pairs of displacements would be required. 'ggpp' does define such new position functions which can also be used together with repulsive geometries from package 'ggrepel'.
Because of the naming convention used, the new position functions remain fully
compatible with all geometries that have a formal parameter position
. However,
most examples below use geometries from packages 'ggrepel' or 'ggpp' to
create a plot layer containing data labels.
As we will use text and labels on the plotting area we change the default theme to an uncluttered one.
library(dplyr) library(lubridate) library(ggplot2) library(ggpp) library(ggrepel) old_theme <- theme_set(theme_bw())
Nudging shifts deterministically the coordinates giving an x and/or y position and also expands the limits of the corresponding scales to match. By default in 'ggplot2' geometries and position functions no nudging is applied.
Function position_nudge()
from package 'ggplot2' simply applies the
nudge, or x and/or y shifts based directly on the values passed to its
parameters x
and y
. Passing arguments to the nudge_x
and/or nudge_y
parameters of a geometry has the same effect as these values are passed to
position_nudge()
. Geometries also have a position
parameter to which
we can pass an expression based on a position function which opens the
door to more elaborate approaches to nudging.
A new variation on simple nudge is provided by function position_nudge_to()
,
which accepts the desired nudged coordinates directly.
We can do better than simply shifting all data to the same extent and direction
or to a fixed position. For example by nudging away from a focal point, a line
or a curve. In position_nudge_center()
and position_nudge_liner()
described
below, this reference alters only the direction (angle) along which nudge is
applied but not the extent of the shift. Advanced nudging works very well, but
only for some patterns of observations and may require manual adjustment of
positions, repulsion is more generally applicable but like jittering is
aleatory. Combining nudging and repulsion we can make repulsion more predictable
with little loss of its applicability.
These functions can be used with any geometry but if segments joining the labels
to the points are desired, ggrepel::geom_text_repel()
or
ggrepel::geom_label_repel()
should be used, possibly setting max.iter = 0
if
no repulsion is desired. Plotting of short segments can be forced with
min.segment.length = 0
. Please see the documentation for these geometries for
the details. Drawing of segments or arrows is made possible by storing in data
both the nudged and original x and y coordinates. This is made possible by
coordinated development of packages 'ggpp' and 'ggrepel' and a naming
convention for storing the original position.
Function position_nudge_keep()
is like ggplot2::position_nudge()
but keeps
(stores) the original x and y coordinates. It is similar to function
position_nudge_repel()
but uses a different naming convention for the
coordinates. Both work with geom_text_repel()
or geom_label_repel()
from
package 'ggrepel' (> 0.9.1), but only position_nudge_keep()
can be used
interchangeably with ggplot2::position_nudge()
with other geometries.
set.seed(84532) df <- data.frame( x = rnorm(20), y = rnorm(20, 2, 2), l = paste("label:", letters[1:20]) )
With position_nudge_keep()
from 'ggpp' used together with geom_text_repel()
or
geom_label_repel()
segments between a nudged and/or repelled label and the
original position (here indicated by a point) are drawn. As shown here, passing
max.iter = 0
disables repulsion.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text_repel(position = position_nudge_keep(x = 0.3), min.segment.length = 0, max.iter = 0)
With position_nudge()
from 'ggplot2' used together with geom_text_repel()
or
geom_label_repel()
segments connecting a nudged and/or repelled label and
the original position (here indicated by a point) are not drawn.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text_repel(position = position_nudge(x = 0.3), min.segment.length = 0, max.iter = 0)
position_nudge_keep()
and all other position functions described below can
be used with all 'ggplot2' geometries but the original position will be ignored
and no connecting segment drawn unless the geometry has been designed to work
together with them. Currently, geom_text_repel()
and geom_label_repel()
from
'ggrepel' and geom_text_s()
, geom_label_s()
and geom_point_s()
from package
'ggpp' draw connecting segments.
geom_text_s()
, geom_label_s()
and geom_point_s()
are different in some respects from geom_text_repel()
and geom_label_repel()
, adding a few features and lacking several of those available in geom_text_repel()
and geom_label_repel()
, in which as shown above repulsion can be disabled if desired. geom_text_s()
, geom_label_s()
and geom_point_s()
do not support repulsion and differ in that aesthetic mappings can be selectively applied to the different components of the label and/or to segments.
In addition, geom_text_s()
and geom_label_s()
use a different approach to justification than geom_text_repel()
, these geometries justify the text or label to the nearest edge to the original position, while geom_text_repel()
by default centres the text label on the nudged or displaced coordinates.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text_s(position = position_nudge_keep(x = 0.1), min.segment.length = 0) + expand_limits(x = 2.3)
geom_text_repel()
trims the segment to the edge of the text plus the padding. In contrast, geom_text_s()
uses justification to avoid the overlap and only the default justification "position"
and one of the edges, "left" in this case, are currently usable.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text_s(position = position_nudge_keep(x = 0.1), min.segment.length = 0, hjust = "left") + expand_limits(x = 2.3)
A usually more problematic example is the labeling of loadings in PCA and similar biplots.
## Example data frame where each species' principal components have been computed. df1 <- data.frame( Species = paste("Species",1:5), PC1 = c(-4, -3.5, 1, 2, 3), PC2 = c(-1, -1, 0, -0.5, 0.7)) ggplot(df1, aes(x=PC1, y = PC2, label = Species, colour = Species)) + geom_hline(aes(yintercept = 0), size=.2) + geom_vline(aes(xintercept = 0), size=.2) + geom_segment(aes(x=0, y=0, xend=PC1, yend=PC2), arrow = arrow(length = unit(0.1, "inches"))) + geom_label_repel(position = position_nudge_center(x = 0.2, y = 0.01, center_x = 0, center_y = 0), label.size = NA, label.padding = 0.1, fill = rgb(red = 1, green = 1, blue = 1, alpha = 0.75)) + xlim(-5, 5) + ylim(-2, 2) + # Stadard settings for displaying biplots coord_fixed() + theme(legend.position = "none")
Function position_nudge_to()
nudges to a given position instead of using
the same shift for each observation. Can be used to align labels for points
that are not themselves aligned.
ggplot(df, aes(x, y, label = ifelse(x < 0.5, "", l) )) + geom_point() + geom_text_repel(position = position_nudge_to(x = 2.3), min.segment.length = 0, segment.color = "red", arrow = arrow(length = unit(0.015, "npc")), direction = "y") + expand_limits(x = 3)
By providing to values for nudging with opposite sign, we can add labels alternating between sides. We use here geom_text_s()
but other geometries could have been used instead.
size_from_area <- function(x) {sqrt(max(0, x) / pi)} df2 <- data.frame(b = exp(seq(2, 4, length.out = 10))) ggplot(df2, aes(1, b, size = b)) + geom_text_s(aes(label = round(b,2)), position = position_nudge_to(x = c(1.1, 0.9)), box.padding = 0) + geom_point() + scale_size_area() + xlim(0, 2) + theme(legend.position = "none")
It is useful when labeling curves than end at different positions along the x axis.
keep <- c("Israel", "United States", "European Union", "China", "South Africa", "Qatar", "Argentina", "Chile", "Brazil", "Ukraine", "Indonesia", "Bangladesh") data <- readr::read_csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv", lazy = FALSE, show_col_types = FALSE) data %>% filter(location %in% keep) %>% select(location, date, total_vaccinations_per_hundred) %>% arrange(location, date) %>% filter(!is.na(total_vaccinations_per_hundred)) %>% mutate(location = factor(location), location = reorder(location, total_vaccinations_per_hundred)) %>% group_by(location) %>% # max(date) depends on the location! mutate(label = if_else(date == max(date), as.character(location), "")) -> owid ggplot(owid, aes(x = date, y = total_vaccinations_per_hundred, color = location)) + geom_line() + geom_text_repel(aes(label = label), size = 3, position = position_nudge_to(x = max(owid$date) + days(30)), segment.color = 'grey', point.size = 0, box.padding = 0.1, point.padding = 0.1, hjust = "left", direction="y") + scale_x_date(expand = expansion(mult = c(0.05, 0.2))) + labs(title = "Cumulative COVID-19 vaccination doses administered per 100 people", y = "", x = "Date (year-month)") + theme_bw() + theme(legend.position = "none")
We here the pass a vector of length one as argument for y
, but both x
and y
also accept longer vectors. In other words, it allows manual positioning of text
and labels.
In the next example we decrease the forces used for repulsion and the padding so that the labels remain close together. In this way, we can label the observations on the rug of a combined point and rug plot.
ggplot(df, aes(x, y, label = round(x, 2))) + geom_point(color = "red", size = 3) + geom_text_repel(position = position_nudge_to(y = -2.7), size = 3, color = "red", angle = 90, hjust = 0, box.padding = 0.05, min.segment.length = Inf, direction = "x", force = 0.1, force_pull = 0.1) + geom_rug(sides = "b", length = unit(0.02, "npc"), color = "red")
Function position_nudge_center()
can nudge radially away from a focal point if
both x
and y
are passed as arguments, or towards opposite sides of a
boundary vertical or horizontal virtual line if only one of x
or y
is
passed an argument. By default, the "center" is the centroid computed using
mean()
, but other functions or numeric values can be passed to override it.
When data are sparse, such nudging may be effective in avoiding label overlaps,
and achieving a visually pleasing positioning.
In all cases nudging shifts the coordinates giving an x and/or y position and also expands the limits of the corresponding scales to include the nudged coordinate values.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text_repel(position = position_nudge_center(x = 0.3, center_x = 0), min.segment.length = 0, max.iter = 0)
By default, split is away or towards the mean()
. Here we allow repulsion to
separate the labels (compare with previous plot).
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text_repel(position = position_nudge_center(x = 0.3, direction = "split"), min.segment.length = 0)
We set a different split point as a constant value.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text_repel(position = position_nudge_center(x = 0.3, center_x = 1, direction = "split"), min.segment.length = 0)
We set a different split point as the value computed by a function function, by name.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text_repel(position = position_nudge_center(x = 0.3, center_x = median, direction = "split"), min.segment.length = 0)
We set a different split point as the value computed by an anonymous function function. Here que split on the first quartile along x.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text_repel(position = position_nudge_center(x = 0.3, center_x = function(x) { quantile(x, probs = 1/4, names = FALSE) }, direction = "split"), min.segment.length = 0)
The labels can be rotated as long as the geometry used supports this.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text_repel(angle = 90, position = position_nudge_center(y = 0.1, direction = "split"))
By requesting nudging along x and y and setting direction = "split"
nudging is applied according to the quadrants centred on the centroid of the data.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text_repel(position = position_nudge_center(x = 0.1, y = 0.15, direction = "split"))
With direction = "radial"
, the distance nudged away from the center is
the same for all labels.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text_repel(position = position_nudge_center(x = 0.25, y = 0.4, direction = "radial"), min.segment.length = 0)
As shown above for direction = "split"
we can set the coordinates of the
center also with direction = "radial"
. In the case of geom_text_s()
the
default justification is "position"
.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text_s(position = position_nudge_center(x = 0.125, y = 0.2, center_x = -0.5, direction = "radial"), min.segment.length = 0) + expand_limits(x = c(-2.7, +2.3))
We can also set the justification the text labels
although repulsion usually works best with labels justified at the centre,
which is the default in geom_text_repel()
.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text_repel(position = position_nudge_center(x = 0.125, y = 0.25, center_x = 0, center_y = 0, direction = "radial"), min.segment.length = 0, hjust = "outward", vjust = "outward") + expand_limits(x = c(-2.7, +2.3))
Nudging along one axis, here x, and setting the repulsion direction
along
the other axis, here y, tends to give a pleasant arrangement of labels.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text_repel(position = position_nudge_center(x = 0.2, center_x = 0, direction = "split"), aes(hjust = ifelse(x < 0, 1, 0)), direction = "y", min.segment.length = 0) + expand_limits(x = c(-3, 3))
Function position_nudge_line()
nudges away from a line, which can be a
user supplied straight line as well as a smooth spline or a polynomial
fitted to the observations themselves. The nudging is away and
perpendicular to the local slope of the straight or curved line. It
relies on the same assumptions as linear regression, assuming that x
values are not subject to error. This in most cases prevents labels from
overlaping a curve fitted to the data, even if not exactly based on the
same model fit. When observations are sparse, this may be enough to
obtain a nice arrangement of data labels.
set.seed(16532) df <- data.frame( x = -10:10, y = (-10:10)^2, yy = (-10:10)^2 + rnorm(21, 0, 4), yyy = (-10:10) + rnorm(21, 0, 4), l = letters[1:21] )
The first, simple example shows that position_nudge_line()
has shifted the
direction of the nudging based on the alignment of the observations along a
line. One could, of course, have in this case passed suitable values as
arguments to x and y using position_nudge()
from package 'ggplot2'.
However, position_nudge_line()
will work without change irrespective of the
slope or intercept along which the observations fall.
ggplot(df, aes(x, 2 * x, label = l)) + geom_point() + geom_abline(intercept = 0, slope = 2, linetype = "dotted") + geom_text(position = position_nudge_line(x = -0.5, y = -0.8))
With observations with high variation in y, a linear model fit may
need to be used. In this case fitted twice, once in stat_smooth()
and once in
position_nudge_line()
.
ggplot(subset(df, x >= 0), aes(x, yyy)) + geom_point() + stat_smooth(method = "lm", formula = y ~ x) + geom_text(aes(label = l), vjust = "center", hjust = "center", position = position_nudge_line(x = 0, y = 1.2, method = "lm", direction = "split"))
With lower variation in y, we can pass to line_nudge
a multiplier to
keep labels outside of the confidence band.
ggplot(subset(df, x >= 0), aes(y, yy)) + geom_point() + stat_smooth(method = "lm", formula = y ~ x) + geom_text(aes(label = l), position = position_nudge_line(method = "lm", x = 3, y = 3, line_nudge = 2.5, direction = "split"))
If we want the nudging based on an arbitrary straight line not computed
from data
, we can pass the intercept and slope in a numeric vector of
length two as an argument to parameter abline
.
ggplot(subset(df, x >= 0), aes(y, yy)) + geom_point() + geom_abline(intercept = 0, slope = 1, linetype = "dotted") + geom_text(aes(label = l), position = position_nudge_line(abline = c(0, 1), x = 3, y = 3, direction = "split"))
With observations that follow exactly a simple curve the defaults work well to
automate the nudging of individual data labels away from the implicit curve.
Positive values as arguments to x
and y
correspond to above and inside the
curve. One could, of course, pass also in this case suitable values as arguments
to x and y using position_nudge()
from package 'ggplot2', but these
arguments would need to be vectors with different nudge values for each
observation.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_line(linetype = "dotted") + geom_text(position = position_nudge_line(x = 0.6, y = 6))
Negative values passed as arguments to x
and y
correspond to labels
below and outside the curve.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_line(linetype = "dotted") + geom_text(position = position_nudge_line(x = -0.6, y = -6))
When the observations include random variation along y, it is
important that the smoother used for the line added to a plot and that
passed to position_nudge_line()
are similar. By default
stat_smooth()
uses "loess"
and position_nudge_line()
with method
"spline"
, smooth.sline()
, which are a good enough match.
ggplot(df, aes(x, yy)) + geom_point() + stat_smooth(method = "loess", formula = y ~ x) + geom_text(aes(y = yy, label = l), position = position_nudge_line(x = 0.6, y = 6, direction = "split"))
We can use other geometries, or rather we need to use a repulsive geometry when the label text is long or the labels are crowed near the line.
ggplot(df, aes(x, yy)) + geom_point() + stat_smooth(method = "loess", formula = y ~ x) + geom_label_repel(aes(y = yy, label = paste("point", l)), position = position_nudge_line(x = 0.6, y = 8, direction = "split"), box.padding = 0.2, min.segment.length = 0)
We can see by comparing the plot above with that below, that combining nudging away from a line with repulsion results in a more pleasant positioning of the data labels.
ggplot(df, aes(x, yy)) + geom_point() + stat_smooth(method = "loess", formula = y ~ x) + geom_label_repel(aes(y = yy, label = paste("point", l)), box.padding = 0.5, min.segment.length = 0)
When fitting a polynomial, "lm"
should be the argument passed to
method
and a model formula preferably based on poly()
, setting raw = TRUE
,
as argument to formula
.
Currently no other methods are implemented in position_nudge_line()
.
ggplot(df, aes(x, yy)) + geom_point() + stat_smooth(method = "lm", formula = y ~ poly(x, 2, raw = TRUE)) + geom_text(aes(y = yy, label = l), position = position_nudge_line(method = "lm", formula = y ~ poly(x, 2, raw = TRUE), x = 0.5, y = 5, direction = "split"))
Compared to pure repulsion adding nudging helps "direct" repulsion in the desired direction. Compare the plot below using no nudging with the one above.
ggplot(df, aes(x, yy)) + geom_point() + stat_smooth(method = "lm", formula = y ~ poly(x, 2, raw = TRUE)) + geom_text_repel(aes(y = yy, label = l), box.padding = 0.5, min.segment.length = Inf)
Using position_stacknudge()
together geom_label_repel()
makes it possible
to use repulsion for labeling sections of stacked column plots.
df <- tibble::tribble( ~y, ~x, ~grp, "a", 1, "some long name", "a", 2, "other name", "b", 1, "some name", "b", 3, "another name", "b", -1, "some long name" ) ggplot(data = df, aes(x, y, group = grp)) + geom_col(aes(fill = grp), width=0.5) + geom_vline(xintercept = 0) + geom_label_repel(aes(label = grp), position = position_stacknudge(vjust = 0.5, y = 0.4), label.size = NA)
I warmly thank Kamil Slowikowski for agreeing to make changes in 'ggrepel' that make the use of 'ggrepel' together with 'ggpp' possible and smooth. This document shows some use examples, but surely new ones will be found by users of R and 'ggplot2'.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.