library(knitr) opts_chunk$set(fig.align = 'center', fig.show = 'hold', fig.width = 7, fig.height = 4) options(warnPartialMatchArgs = FALSE, tibble.print.max = 4, tibble.print.min = 4, dplyr.summarise.inform = FALSE)
As we will use text and labels on the plotting area we change the default theme to an uncluttered one.
library(ggpmisc) old_theme <- theme_set(theme_bw())
Function position_nudge()
from package 'ggplot2' simply applies the
nudge, or x and y shift based directly on the values passed to its
parameters nudge_x
and nudge_y
. In some cases, this means manually
adjusting the nudge based on the plotted data.
We can do better, by setting a point or line away from which the nudge should be applied. In the new functions this reference alters only the direction (angle) along which nudge is applied but not the distance.
Function position_nudge_center()
limits the directions to radiate away
from a reference point if both nudge_x and nudge_y are passed an
argument, or if only one of nudge_x or nudge_y is passed an
argument, nudge right and left, or nudge up and down, respectively. By
default, the "center" is the centroid computed using mean()
, but other
functions or numeric values can be passed to override it. When data are
sparse, such nudging may be effective in avoiding label overlaps, and
achieving a visually pleasing posiitoning.
set.seed(84532) df <- data.frame( x = rnorm(20), y = rnorm(20, 2, 2), l = sample(letters[1:6], 20, replace = TRUE) )
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text(position = position_nudge_center(x = 0.1))
By default, split based on mean()
.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text(position = position_nudge_center(x = 0.1, direction = "split"))
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text(position = position_nudge_center(x = 0.1, center_x = 1, direction = "split"))
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text(position = position_nudge_center(x = 0.1, y = 0.15, direction = "split"))
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text(position = position_nudge_center(x = 0.1, y = 0.3, direction = "radial"))
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_text(position = position_nudge_center(x = 0.1, y = 0.3, center_x = 0, center_y = 0, direction = "radial"))
Function position_nudge_line()
nudges away from a line, which can be a
user supplied straight line as well as a smooth spline or a polynomial
fitted to the observations themselves. The nudging is away and
perpendicular to the local slope of the straight or curved line. It
relies on the same assumptions as linear regression, assuming that x
values are not subject to error. This in most cases prevents labels from
overlaping a curve fitted to the data, even if not exactly based on the
same model fit. When observations are sparse, this may be enough to
obtain a nice arrangement of data labels.
set.seed(16532) df <- data.frame( x = -10:10, y = (-10:10)^2, yy = (-10:10)^2 + rnorm(21, 0, 4), yyy = (-10:10) + rnorm(21, 0, 4), l = letters[1:21] )
The first, simple example shows that position_nudge_line()
has shifted
the direction of the nudging based on the alignment of the observations
along a line. One could, of course, have in this case passed suitable
values as arguments to x and y using position_nudge()
from package
'ggplot2'.
Default in 'ggplot2' is no nudging.
ggplot(df, aes(x, 2 * x, label = l)) + geom_point() + geom_abline(intercept = 0, slope = 2, linetype = "dotted") + geom_text(position = position_nudge(x = 0.4, y = -0.7))
Same for position_nudge_line()
.
ggplot(df, aes(x, 2 * x, label = l)) + geom_point() + geom_abline(intercept = 0, slope = 2, linetype = "dotted") + geom_text(position = position_nudge_line(x = -0.5, y = -0.8))
However, position_nudge_line()
will work without change irrespective
of the slope or intercept along which the observations fall.
ggplot(df, aes(x, -3 * x, label = l)) + geom_point() + geom_abline(intercept = 0, slope = -3, linetype = "dotted") + geom_text(position = position_nudge_line(x = -0.5, y = -0.8))
With observations with high variation in y, a linear model fit may need to be used.
ggplot(subset(df, x >= 0), aes(x, yyy)) + geom_point() + stat_smooth(method = "lm") + geom_text(aes(label = l), vjust = "center", hjust = "center", position = position_nudge_line(x = 0, y = 1.2, method = "lm", direction = "split"))
With lower variation in y, we can pass to line_nudge
a multiplier to
keep labels outside of the confidence band.
ggplot(subset(df, x >= 0), aes(y, yy)) + geom_point() + stat_smooth(method = "lm", formula = y ~ x) + geom_text(aes(label = l), position = position_nudge_line(method = "lm", x = 3, y = 3, line_nudge = 2.5, direction = "split"))
If we want the nudging based on an arbitrary straight line not computed
from data
, we can pass the intercept and slope in a numeric vector of
length two as an argument to parameter abline
.
ggplot(subset(df, x >= 0), aes(y, yy)) + geom_point() + geom_abline(intercept = 0, slope = 1, linetype = "dotted") + geom_text(aes(label = l), position = position_nudge_line(abline = c(0, 1), x = 3, y = 3, direction = "split"))
With observations that follow exactly a simple curve the defaults work well to
automate the nudging of individual data labels away from the implicit curve.
Positive values as arguments to x
and y
correspond to above and inside the
curve. One could, of course, pass also in this case suitable values as arguments
to x and y using position_nudge()
from package 'ggplot2', but these
arguments would need to be vectors with different nudge values for each
observation.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_line(linetype = "dotted") + geom_text(position = position_nudge_line(x = 0.6, y = 6))
Negative values passed as arguments to x
and y
correspond to labels
below and outside the curve.
ggplot(df, aes(x, y, label = l)) + geom_point() + geom_line(linetype = "dotted") + geom_text(position = position_nudge_line(x = -0.6, y = -6))
When the observations include random variation along y, it is
important that the smoother used for the line added to a plot and that
passed to position_nudge_line()
are similar. By default
stat_smooth()
uses "loess"
and position_nudge_line()
with methid
"spline"
, smooth.sline()
, which are a good enough match.
ggplot(df, aes(x, yy, label = l)) + geom_point() + stat_smooth() + geom_text(aes(y = yy, label = l), position = position_nudge_line(x = 0.6, y = 6, direction = "split"))
When fitting a polynomial, "lm"
should be the argument passed to
method
and a model formula preferably based on poly()
, setting raw = TRUE
,
as argument to formula
.
Currently no other methods are implemented in position_nudge_line()
.
ggplot(df, aes(x, yy, label = l)) + geom_point() + stat_smooth(method = "lm", formula = y ~ poly(x, 2, raw = TRUE)) + geom_text(aes(y = yy, label = l), position = position_nudge_line(method = "lm", x = 0.6, y = 6, formula = y ~ poly(x, 2, raw = TRUE), direction = "split"))
Grouping is supported.
df <- data.frame(x = rep(1:10, 2), y = c(1:10, 10:1), group = rep(c("a", "b"), c(10, 10)), l = "+")
ggplot(df, aes(x, y, label = l, color = group)) + geom_line(linetype = "dotted") + geom_text() + geom_text(position = position_nudge_line(x = 0.25, y = 1)) + geom_text(position = position_nudge_line(x = -0.25, y = -1)) + coord_equal(ratio = 0.5)
One needs to ensure that grouping is in effect in the geoms with nudging.
ggplot(df, aes(x, y, label = l, color = group, group = group)) + geom_line(linetype = "dotted") + geom_text() + geom_text(color = "red", position = position_nudge_line(x = 1, y = 1)) + geom_text(color = "blue", position = position_nudge_line(x = -1, y = -1)) + coord_equal()
Facets are also supported.
ggplot(df, aes(x, y, label = l)) + geom_line(linetype = "dotted") + geom_text() + geom_text(position = position_nudge_line(x = 1, y = 1), color = "red") + geom_text(position = position_nudge_line(x = -1, y = -1), color = "blue") + facet_wrap(~group) + coord_equal(ratio = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.