In aphalo/ggpmisc: Miscellaneous Extensions to 'ggplot2'

library(knitr)
opts_chunk$set(fig.align = 'center', 
               fig.show = 'hold', fig.width = 7, fig.height = 4)
options(warnPartialMatchArgs = FALSE,
        tibble.print.max = 4,
        tibble.print.min = 4,
        dplyr.summarise.inform = FALSE)

Preliminaries

As we will use text and labels on the plotting area we change the default theme to an uncluttered one.

library(ggpmisc)
old_theme <- theme_set(theme_bw())

Position functions

Function position_nudge() from package 'ggplot2' simply applies the nudge, or x and y shift based directly on the values passed to its parameters nudge_x and nudge_y. In some cases, this means manually adjusting the nudge based on the plotted data.

We can do better, by setting a point or line away from which the nudge should be applied. In the new functions this reference alters only the direction (angle) along which nudge is applied but not the distance.

position_nudge_center()

Function position_nudge_center() limits the directions to radiate away from a reference point if both nudge_x and nudge_y are passed an argument, or if only one of nudge_x or nudge_y is passed an argument, nudge right and left, or nudge up and down, respectively. By default, the "center" is the centroid computed using mean(), but other functions or numeric values can be passed to override it. When data are sparse, such nudging may be effective in avoiding label overlaps, and achieving a visually pleasing posiitoning.

set.seed(84532)
df <- data.frame(
  x = rnorm(20),
  y = rnorm(20, 2, 2),
  l = sample(letters[1:6], 20, replace = TRUE)
)

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text(position = position_nudge_center(x = 0.1))

By default, split based on mean().

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text(position = position_nudge_center(x = 0.1,
                                             direction = "split"))

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text(position = position_nudge_center(x = 0.1,
                                             center_x = 1,
                                             direction = "split"))

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text(position = position_nudge_center(x = 0.1,
                                             y = 0.15,
                                             direction = "split"))

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text(position = position_nudge_center(x = 0.1,
                                             y = 0.3,
                                             direction = "radial"))

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_text(position = position_nudge_center(x = 0.1,
                                             y = 0.3,
                                             center_x = 0,
                                             center_y = 0,
                                             direction = "radial"))

position_nudge_line()

Function position_nudge_line() nudges away from a line, which can be a user supplied straight line as well as a smooth spline or a polynomial fitted to the observations themselves. The nudging is away and perpendicular to the local slope of the straight or curved line. It relies on the same assumptions as linear regression, assuming that x values are not subject to error. This in most cases prevents labels from overlaping a curve fitted to the data, even if not exactly based on the same model fit. When observations are sparse, this may be enough to obtain a nice arrangement of data labels.

set.seed(16532)
df <- data.frame(
  x = -10:10,
  y = (-10:10)^2,
  yy = (-10:10)^2 + rnorm(21, 0, 4),
  yyy = (-10:10) + rnorm(21, 0, 4),
  l = letters[1:21]
)

The first, simple example shows that position_nudge_line() has shifted the direction of the nudging based on the alignment of the observations along a line. One could, of course, have in this case passed suitable values as arguments to x and y using position_nudge() from package 'ggplot2'.

Default in 'ggplot2' is no nudging.

ggplot(df, aes(x, 2 * x, label = l)) +
  geom_point() +
  geom_abline(intercept = 0, slope = 2, linetype = "dotted") +
  geom_text(position = position_nudge(x = 0.4, y = -0.7))

Same for position_nudge_line().

ggplot(df, aes(x, 2 * x, label = l)) +
  geom_point() +
  geom_abline(intercept = 0, slope = 2, linetype = "dotted") +
  geom_text(position = position_nudge_line(x = -0.5, y = -0.8))

However, position_nudge_line() will work without change irrespective of the slope or intercept along which the observations fall.

ggplot(df, aes(x, -3 * x, label = l)) +
  geom_point() +
  geom_abline(intercept = 0, slope = -3, linetype = "dotted") +
  geom_text(position = position_nudge_line(x = -0.5, y = -0.8))

With observations with high variation in y, a linear model fit may need to be used.

ggplot(subset(df, x >= 0), aes(x, yyy)) +
  geom_point() +
  stat_smooth(method = "lm") +
  geom_text(aes(label = l),
            vjust = "center", hjust = "center",
            position = position_nudge_line(x = 0, y = 1.2,
                                           method = "lm",
                                           direction = "split"))

With lower variation in y, we can pass to line_nudge a multiplier to keep labels outside of the confidence band.

ggplot(subset(df, x >= 0), aes(y, yy)) +
  geom_point() +
  stat_smooth(method = "lm", formula = y ~ x) +
  geom_text(aes(label = l),
            position = position_nudge_line(method = "lm",
                                           x = 3, y = 3, 
                                           line_nudge = 2.5,
                                           direction = "split"))

If we want the nudging based on an arbitrary straight line not computed from data, we can pass the intercept and slope in a numeric vector of length two as an argument to parameter abline.

ggplot(subset(df, x >= 0), aes(y, yy)) +
  geom_point() +
  geom_abline(intercept = 0, slope = 1, linetype = "dotted") +
  geom_text(aes(label = l),
            position = position_nudge_line(abline = c(0, 1),
                                           x = 3, y = 3, 
                                           direction = "split"))

With observations that follow exactly a simple curve the defaults work well to automate the nudging of individual data labels away from the implicit curve. Positive values as arguments to x and y correspond to above and inside the curve. One could, of course, pass also in this case suitable values as arguments to x and y using position_nudge() from package 'ggplot2', but these arguments would need to be vectors with different nudge values for each observation.

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_line(linetype = "dotted") +
  geom_text(position = position_nudge_line(x = 0.6, y = 6))

Negative values passed as arguments to x and y correspond to labels below and outside the curve.

ggplot(df, aes(x, y, label = l)) +
  geom_point() +
  geom_line(linetype = "dotted") +
  geom_text(position = position_nudge_line(x = -0.6, y = -6))

When the observations include random variation along y, it is important that the smoother used for the line added to a plot and that passed to position_nudge_line() are similar. By default stat_smooth() uses "loess" and position_nudge_line() with methid "spline", smooth.sline(), which are a good enough match.

ggplot(df, aes(x, yy, label = l)) +
  geom_point() +
  stat_smooth() +
  geom_text(aes(y = yy, label = l),
            position = position_nudge_line(x = 0.6, 
                                           y = 6,
                                           direction = "split"))

When fitting a polynomial, "lm" should be the argument passed to method and a model formula preferably based on poly(), setting raw = TRUE, as argument to formula. Currently no other methods are implemented in position_nudge_line().

ggplot(df, aes(x, yy, label = l)) +
  geom_point() +
  stat_smooth(method = "lm", formula = y ~ poly(x, 2, raw = TRUE)) +
  geom_text(aes(y = yy, label = l),
            position = position_nudge_line(method = "lm",
                                           x = 0.6, 
                                           y = 6,
                                           formula = y ~ poly(x, 2, raw = TRUE),
                                           direction = "split"))

Grouping is supported.

df <- data.frame(x = rep(1:10, 2),
                 y = c(1:10, 10:1),
                 group = rep(c("a", "b"), c(10, 10)),
                 l = "+")

ggplot(df, aes(x, y, label = l, color = group)) +
  geom_line(linetype = "dotted") +
  geom_text() +
  geom_text(position = position_nudge_line(x = 0.25, y = 1)) +
  geom_text(position = position_nudge_line(x = -0.25, y = -1)) +
  coord_equal(ratio = 0.5)

One needs to ensure that grouping is in effect in the geoms with nudging.

ggplot(df, aes(x, y, label = l, color = group, group = group)) +
  geom_line(linetype = "dotted") +
  geom_text() +
  geom_text(color = "red",
            position = position_nudge_line(x = 1, y = 1)) +
  geom_text(color = "blue",
            position = position_nudge_line(x = -1, y = -1)) +
  coord_equal()

Facets are also supported.

ggplot(df, aes(x, y, label = l)) +
  geom_line(linetype = "dotted") +
  geom_text() +
  geom_text(position = position_nudge_line(x = 1, y = 1),
            color = "red") +
  geom_text(position = position_nudge_line(x = -1, y = -1),
            color = "blue") +
  facet_wrap(~group) +
  coord_equal(ratio = 1)