In aphalo/ggpp: Grammar Extensions to 'ggplot2'

library(knitr)
opts_chunk$set(fig.align = 'center', 
               fig.show = 'hold', fig.width = 7, fig.height = 4)
options(warnPartialMatchArgs = FALSE,
        tibble.print.max = 4,
        tibble.print.min = 4,
        dplyr.summarise.inform = FALSE)

Introduction

The very popular R package 'ggrepel' does a great job at avoiding overlaps among data labels and between them and observations plotted as points. A difficulty that stems from the use of an algorithm based on random displacements is that the final location of the data labels can become more disordered than necessary. In addition when including smooth regression lines the data labels may partly occlude the fitted line and/or the confidence band.

Package 'ggpp' defines new position functions that save the starting position like position_nudge_repel() does but come in multiple flavors. Their use together with repulsive geometries from 'ggrepel' makes it possible to give to the data labels an initial "push" in a non-random direction. This helps a lot, much more than what I expect initially, in obtaining more orderly displacement of the data labels away from a cloud of observations or line.

Another problem sometimes encountered when using position functions is that combinations of pairs of displacements would be required. 'ggpp' does define such new position functions which can also be used together with repulsive geometries from package 'ggrepel'.

Because of the naming convention used, the new position functions remain fully compatible with all geometries that have a formal parameter position. However, most examples below use geometries from packages 'ggrepel' or 'ggpp' to create a plot layer containing data labels.

Preliminaries

As we will use text and labels on the plotting area we change the default theme to an uncluttered one.

library(dplyr)
library(lubridate)
library(ggplot2)
library(ggpp)
library(ggrepel)

old_theme <- theme_set(theme_bw())

Nudge position functions

Nudging shifts deterministically the coordinates giving an x and/or y position and also expands the limits of the corresponding scales to match. By default in 'ggplot2' geometries and position functions no nudging is applied.

Function position_nudge() from package 'ggplot2' simply applies the nudge, or x and/or y shifts based directly on the values passed to its parameters x and y. Passing arguments to the nudge_x and/or nudge_y parameters of a geometry has the same effect as these values are passed to position_nudge(). Geometries also have a position parameter to which we can pass an expression based on a position function which opens the door to more elaborate approaches to nudging.

A new variation on simple nudge is provided by function position_nudge_to(), which accepts the desired nudged coordinates directly.

We can do better than simply shifting all data to the same extent and direction or to a fixed position. For example by nudging away from a focal point, a line or a curve. In position_nudge_center() and position_nudge_liner() described below, this reference alters only the direction (angle) along which nudge is applied but not the extent of the shift. Advanced nudging works very well, but only for some patterns of observations and may require manual adjustment of positions, repulsion is more generally applicable but like jittering is aleatory. Combining nudging and repulsion we can make repulsion more predictable with little loss of its applicability.

These functions can be used with any geometry but if segments joining the labels to the points are desired, ggrepel::geom_text_repel() or ggrepel::geom_label_repel() should be used, possibly setting max.iter = 0 if no repulsion is desired. Plotting of short segments can be forced with min.segment.length = 0. Please see the documentation for these geometries for the details. Drawing of segments or arrows is made possible by storing in data both the nudged and original x and y coordinates. This is made possible by coordinated development of packages 'ggpp' and 'ggrepel' and a naming convention for storing the original position.

position_nudge_keep()

Function position_nudge_keep() is like ggplot2::position_nudge() but keeps (stores) the original x and y coordinates. It is similar to function position_nudge_repel() but uses a different naming convention for the coordinates. Both work with geom_text_repel() or geom_label_repel() from package 'ggrepel' (> 0.9.1), but only position_nudge_keep() can be used interchangeably with ggplot2::position_nudge() with other geometries.

set.seed(84532)
df <- data.frame(
  x = rnorm(20),
  y = rnorm(20, 2, 2),
  l = paste("label:", letters[1:20])
)

With position_nudge_keep() from 'ggpp' used together with geom_text_repel() or geom_label_repel() segments between a nudged and/or repelled label and the original position (here indicated by a point) are drawn. As shown here, passing max.iter = 0 disables repulsion.

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text_repel(position = position_nudge_keep(x = 0.3),
                  min.segment.length = 0, max.iter = 0)

With position_nudge() from 'ggplot2' used together with geom_text_repel() or geom_label_repel() segments connecting a nudged and/or repelled label and the original position (here indicated by a point) are not drawn.

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text_repel(position = position_nudge(x = 0.3),
                  min.segment.length = 0, 
                  max.iter = 0)

position_nudge_keep() and all other position functions described below can be used with all 'ggplot2' geometries but the original position will be ignored and no connecting segment drawn unless the geometry has been designed to work together with them. Currently, geom_text_repel() and geom_label_repel() from 'ggrepel' and geom_text_s(), geom_label_s() and geom_point_s() from package 'ggpp' draw connecting segments.

geom_text_s(), geom_label_s() and geom_point_s() are different in some respects from geom_text_repel() and geom_label_repel(), adding a few features and lacking several of those available in geom_text_repel() and geom_label_repel(), in which as shown above repulsion can be disabled if desired. geom_text_s(), geom_label_s() and geom_point_s() do not support repulsion and differ in that aesthetic mappings can be selectively applied to the different components of the label and/or to segments.

In addition, geom_text_s() and geom_label_s() use a different approach to justification than geom_text_repel(), these geometries justify the text or label to the nearest edge to the original position, while geom_text_repel() by default centres the text label on the nudged or displaced coordinates.

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text_s(position = position_nudge_keep(x = 0.1),
              min.segment.length = 0) +
  expand_limits(x = 2.3)

geom_text_repel() trims the segment to the edge of the text plus the padding. In contrast, geom_text_s() uses justification to avoid the overlap and only the default justification "position" and one of the edges, "left" in this case, are currently usable.

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text_s(position = position_nudge_keep(x = 0.1),
              min.segment.length = 0,
              hjust = "left") +
  expand_limits(x = 2.3)

A usually more problematic example is the labeling of loadings in PCA and similar biplots.

## Example data frame where each species' principal components have been computed.
df1 <- data.frame(
  Species = paste("Species",1:5),
  PC1     = c(-4, -3.5, 1, 2, 3),
  PC2     = c(-1, -1, 0, -0.5, 0.7)) 

ggplot(df1, aes(x=PC1, y = PC2, label = Species, colour = Species)) +
  geom_hline(aes(yintercept = 0), size=.2) +
  geom_vline(aes(xintercept = 0), size=.2) +
  geom_segment(aes(x=0, y=0, xend=PC1, yend=PC2), 
               arrow = arrow(length = unit(0.1, "inches"))) +
  geom_label_repel(position = position_nudge_center(x = 0.2, y = 0.01,
                                                    center_x = 0, center_y = 0),
                   label.size = NA,
                   label.padding = 0.1,
                   fill = rgb(red = 1, green = 1, blue = 1, alpha = 0.75)) +
  xlim(-5, 5) +
  ylim(-2, 2) +
  # Stadard settings for displaying biplots
  coord_fixed() +
  theme(legend.position = "none")

position_nudge_to()

Function position_nudge_to() nudges to a given position instead of using the same shift for each observation. Can be used to align labels for points that are not themselves aligned.

ggplot(df, aes(x, y, label = ifelse(x < 0.5, "", l) )) +
  geom_point() +
  geom_text_repel(position = 
                    position_nudge_to(x = 2.3),
                  min.segment.length = 0,
                  segment.color = "red",
                  arrow = arrow(length = unit(0.015, "npc")),
                  direction = "y") +
  expand_limits(x = 3)

By providing to values for nudging with opposite sign, we can add labels alternating between sides. We use here geom_text_s() but other geometries could have been used instead.

size_from_area <- function(x) {sqrt(max(0, x) / pi)}

df2 <- data.frame(b = exp(seq(2, 4, length.out = 10)))

ggplot(df2, aes(1, b, size = b)) + 
  geom_text_s(aes(label = round(b,2)),
              position = position_nudge_to(x = c(1.1, 0.9)),
              box.padding = 0) +
  geom_point() +
  scale_size_area() +
  xlim(0, 2) +
  theme(legend.position = "none")

It is useful when labeling curves than end at different positions along the x axis.

keep <- c("Israel", "United States", "European Union", "China", "South Africa", "Qatar",
          "Argentina", "Chile", "Brazil", "Ukraine", "Indonesia", "Bangladesh")

data <- readr::read_csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv", lazy = FALSE, show_col_types = FALSE) 

data %>%
  filter(location %in% keep) %>%
  select(location, date, total_vaccinations_per_hundred) %>%
  arrange(location, date) %>%
  filter(!is.na(total_vaccinations_per_hundred)) %>%
  mutate(location = factor(location),
         location = reorder(location, total_vaccinations_per_hundred)) %>%
  group_by(location) %>% # max(date) depends on the location!
  mutate(label = if_else(date == max(date), 
                         as.character(location), 
                         "")) -> owid

ggplot(owid,
       aes(x = date, 
           y = total_vaccinations_per_hundred,
           color = location)) +
  geom_line() +
  geom_text_repel(aes(label = label),
                  size = 3,
                  position = position_nudge_to(x = max(owid$date) + days(30)),
                  segment.color = 'grey',
                  point.size = 0,
                  box.padding = 0.1,
                  point.padding = 0.1,
                  hjust = "left",
                  direction="y") + 
  scale_x_date(expand = expansion(mult = c(0.05, 0.2))) +
  labs(title = "Cumulative COVID-19 vaccination doses administered per 100 people",
       y = "",
       x = "Date (year-month)") +
  theme_bw() +
  theme(legend.position = "none")

We here the pass a vector of length one as argument for y, but both x and y also accept longer vectors. In other words, it allows manual positioning of text and labels.

In the next example we decrease the forces used for repulsion and the padding so that the labels remain close together. In this way, we can label the observations on the rug of a combined point and rug plot.

ggplot(df, aes(x, y, label = round(x, 2))) +
  geom_point(color = "red", size = 3) +
  geom_text_repel(position = position_nudge_to(y = -2.7), 
            size = 3,
            color = "red",
            angle = 90,
            hjust = 0,
            box.padding = 0.05,
            min.segment.length = Inf,
            direction = "x",
            force = 0.1,
            force_pull = 0.1) +
  geom_rug(sides = "b", length = unit(0.02, "npc"), color = "red")

position_nudge_center()

Function position_nudge_center() can nudge radially away from a focal point if both x and y are passed as arguments, or towards opposite sides of a boundary vertical or horizontal virtual line if only one of x or y is passed an argument. By default, the "center" is the centroid computed using mean(), but other functions or numeric values can be passed to override it. When data are sparse, such nudging may be effective in avoiding label overlaps, and achieving a visually pleasing positioning.

In all cases nudging shifts the coordinates giving an x and/or y position and also expands the limits of the corresponding scales to include the nudged coordinate values.

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text_repel(position = 
                    position_nudge_center(x = 0.3, center_x = 0),
                    min.segment.length = 0, max.iter = 0)

By default, split is away or towards the mean(). Here we allow repulsion to separate the labels (compare with previous plot).

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text_repel(position = 
                    position_nudge_center(x = 0.3,
                                          direction = "split"),
                  min.segment.length = 0)

We set a different split point as a constant value.

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text_repel(position = 
                    position_nudge_center(x = 0.3,
                                          center_x = 1,
                                          direction = "split"),
                  min.segment.length = 0)

We set a different split point as the value computed by a function function, by name.

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text_repel(position = 
                    position_nudge_center(x = 0.3,
                                          center_x = median,
                                          direction = "split"),
                  min.segment.length = 0)

We set a different split point as the value computed by an anonymous function function. Here que split on the first quartile along x.

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text_repel(position = 
                    position_nudge_center(x = 0.3,
                                          center_x = function(x) {
                                            quantile(x, 
                                                     probs = 1/4, 
                                                     names = FALSE)
                                          },
                                          direction = "split"),
                  min.segment.length = 0)

The labels can be rotated as long as the geometry used supports this.

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text_repel(angle = 90,
                  position = 
                    position_nudge_center(y = 0.1,
                                          direction = "split"))

By requesting nudging along x and y and setting direction = "split" nudging is applied according to the quadrants centred on the centroid of the data.

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text_repel(position = 
                    position_nudge_center(x = 0.1,
                                          y = 0.15,
                                          direction = "split"))

With direction = "radial", the distance nudged away from the center is the same for all labels.

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text_repel(position = 
                    position_nudge_center(x = 0.25,
                                          y = 0.4,
                                          direction = "radial"),
                  min.segment.length = 0)

As shown above for direction = "split" we can set the coordinates of the center also with direction = "radial". In the case of geom_text_s() the default justification is "position".

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text_s(position = 
                    position_nudge_center(x = 0.125,
                                          y = 0.2,
                                          center_x = -0.5,
                                          direction = "radial"),
                  min.segment.length = 0) +
  expand_limits(x = c(-2.7, +2.3))

We can also set the justification the text labels although repulsion usually works best with labels justified at the centre, which is the default in geom_text_repel().

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text_repel(position = 
                    position_nudge_center(x = 0.125,
                                          y = 0.25,
                                          center_x = 0,
                                          center_y = 0,
                                          direction = "radial"),
                  min.segment.length = 0,
                  hjust = "outward", vjust = "outward") +
  expand_limits(x = c(-2.7, +2.3))

Nudging along one axis, here x, and setting the repulsion direction along the other axis, here y, tends to give a pleasant arrangement of labels.

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text_repel(position = 
                    position_nudge_center(x = 0.2,
                                          center_x = 0,
                                          direction = "split"),
                  aes(hjust = ifelse(x < 0, 1, 0)),
                  direction = "y",
                  min.segment.length = 0) +
  expand_limits(x = c(-3, 3))

position_nudge_line()

Function position_nudge_line() nudges away from a line, which can be a user supplied straight line as well as a smooth spline or a polynomial fitted to the observations themselves. The nudging is away and perpendicular to the local slope of the straight or curved line. It relies on the same assumptions as linear regression, assuming that x values are not subject to error. This in most cases prevents labels from overlaping a curve fitted to the data, even if not exactly based on the same model fit. When observations are sparse, this may be enough to obtain a nice arrangement of data labels.

set.seed(16532)
df <- data.frame(
  x = -10:10,
  y = (-10:10)^2,
  yy = (-10:10)^2 + rnorm(21, 0, 4),
  yyy = (-10:10) + rnorm(21, 0, 4),
  l = letters[1:21]
)

The first, simple example shows that position_nudge_line() has shifted the direction of the nudging based on the alignment of the observations along a line. One could, of course, have in this case passed suitable values as arguments to x and y using position_nudge() from package 'ggplot2'. However, position_nudge_line() will work without change irrespective of the slope or intercept along which the observations fall.

ggplot(df, aes(x, 2 * x, label = l)) +
  geom_point() +
  geom_abline(intercept = 0, slope = 2, linetype = "dotted") +
  geom_text(position = position_nudge_line(x = -0.5, y = -0.8))

With observations with high variation in y, a linear model fit may need to be used. In this case fitted twice, once in stat_smooth() and once in position_nudge_line().

ggplot(subset(df, x >= 0), aes(x, yyy)) +
  geom_point() +
  stat_smooth(method = "lm", formula = y ~ x) +
  geom_text(aes(label = l),
            vjust = "center", hjust = "center",
            position = position_nudge_line(x = 0, y = 1.2,
                                           method = "lm",
                                           direction = "split"))

With lower variation in y, we can pass to line_nudge a multiplier to keep labels outside of the confidence band.

ggplot(subset(df, x >= 0), aes(y, yy)) +
  geom_point() +
  stat_smooth(method = "lm", formula = y ~ x) +
  geom_text(aes(label = l),
            position = position_nudge_line(method = "lm",
                                           x = 3, y = 3, 
                                           line_nudge = 2.5,
                                           direction = "split"))

If we want the nudging based on an arbitrary straight line not computed from data, we can pass the intercept and slope in a numeric vector of length two as an argument to parameter abline.

ggplot(subset(df, x >= 0), aes(y, yy)) +
  geom_point() +
  geom_abline(intercept = 0, slope = 1, linetype = "dotted") +
  geom_text(aes(label = l),
            position = position_nudge_line(abline = c(0, 1),
                                           x = 3, y = 3, 
                                           direction = "split"))

With observations that follow exactly a simple curve the defaults work well to automate the nudging of individual data labels away from the implicit curve. Positive values as arguments to x and y correspond to above and inside the curve. One could, of course, pass also in this case suitable values as arguments to x and y using position_nudge() from package 'ggplot2', but these arguments would need to be vectors with different nudge values for each observation.

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_line(linetype = "dotted") +
  geom_text(position = position_nudge_line(x = 0.6, y = 6))

Negative values passed as arguments to x and y correspond to labels below and outside the curve.

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_line(linetype = "dotted") +
  geom_text(position = position_nudge_line(x = -0.6, y = -6))

When the observations include random variation along y, it is important that the smoother used for the line added to a plot and that passed to position_nudge_line() are similar. By default stat_smooth() uses "loess" and position_nudge_line() with method "spline", smooth.sline(), which are a good enough match.

ggplot(df, aes(x, yy)) +
  geom_point() +
  stat_smooth(method = "loess", formula = y ~ x) +
  geom_text(aes(y = yy, label = l),
            position = position_nudge_line(x = 0.6, 
                                           y = 6,
                                           direction = "split"))

We can use other geometries, or rather we need to use a repulsive geometry when the label text is long or the labels are crowed near the line.

ggplot(df, aes(x, yy)) +
  geom_point() +
  stat_smooth(method = "loess", formula = y ~ x) +
  geom_label_repel(aes(y = yy, label = paste("point", l)),
                   position = position_nudge_line(x = 0.6, 
                                                  y = 8,
                                                  direction = "split"),
                   box.padding = 0.2,
                   min.segment.length = 0)

We can see by comparing the plot above with that below, that combining nudging away from a line with repulsion results in a more pleasant positioning of the data labels.

ggplot(df, aes(x, yy)) +
  geom_point() +
  stat_smooth(method = "loess", formula = y ~ x) +
  geom_label_repel(aes(y = yy, label = paste("point", l)),
                  box.padding = 0.5,
                  min.segment.length = 0)

When fitting a polynomial, "lm" should be the argument passed to method and a model formula preferably based on poly(), setting raw = TRUE, as argument to formula.

Currently no other methods are implemented in position_nudge_line().

ggplot(df, aes(x, yy)) +
  geom_point() +
  stat_smooth(method = "lm", 
              formula = y ~ poly(x, 2, raw = TRUE)) +
  geom_text(aes(y = yy, label = l),
            position = position_nudge_line(method = "lm",
                                           formula = y ~ poly(x, 2, raw = TRUE),
                                           x = 0.5, 
                                           y = 5,
                                           direction = "split"))

Compared to pure repulsion adding nudging helps "direct" repulsion in the desired direction. Compare the plot below using no nudging with the one above.

ggplot(df, aes(x, yy)) +
  geom_point() +
  stat_smooth(method = "lm", 
              formula = y ~ poly(x, 2, raw = TRUE)) +
  geom_text_repel(aes(y = yy, label = l),
                  box.padding = 0.5,
                  min.segment.length = Inf)

Combined position functions

Using position_stacknudge() together geom_label_repel() makes it possible to use repulsion for labeling sections of stacked column plots.

df <- tibble::tribble(
  ~y, ~x, ~grp,
  "a", 1,  "some long name",
  "a", 2,  "other name",
  "b", 1,  "some name",
  "b", 3,  "another name",
  "b", -1, "some long name"
)

ggplot(data = df, aes(x, y, group = grp)) +
  geom_col(aes(fill = grp), width=0.5) +
  geom_vline(xintercept = 0) +
  geom_label_repel(aes(label = grp),
                   position = position_stacknudge(vjust = 0.5, y = 0.4),
                   label.size = NA)

Acknowledgements

I warmly thank Kamil Slowikowski for agreeing to make changes in 'ggrepel' that make the use of 'ggrepel' together with 'ggpp' possible and smooth. This document shows some use examples, but surely new ones will be found by users of R and 'ggplot2'.

aphalo/ggpp documentation built on Feb. 27, 2025, 10:19 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com