library(knitr) opts_chunk$set(fig.align = 'center', fig.show = 'hold', fig.width = 7, fig.height = 4) options(warnPartialMatchArgs = FALSE, tibble.print.max = 4, tibble.print.min = 4, dplyr.summarise.inform = FALSE) eval_flag <- TRUE # evaluate all code chunks
The very popular R package 'ggrepel' does a great job at avoiding overlaps among data labels and between them and observations plotted as points. A difficulty that stems from the use of an algorithm based on random displacements is that the final location of the data labels can bacome more disordered than necessary and when including smooth regression lines the data labels may partly occlude the fitted line and/or the confidence band when drawn.
Package 'ggpp' defined new position functions that save the starting position like position_nudge_repel()
does but come in multiple flavours. Their use together with repulsive geoms from 'ggrepel' makes it possible to give to the data labels an initial "push" in a non-random direction. This helps a lot, something that initially was a surprise to me, in obtaining more orderly displacement of the data labels away from a cloud of observations.
Another problem sometimes encountered when using position functions is that combinations of pairs of displacements would be required. 'ggpp' does also provide such new position functions which can also be used together with repulsive geomeries from package 'ggrepel'.
In the rest of this article I provide examples, both from scratch and as answers to issues raised in the 'ggrepel' and 'ggpp' git repositories.
We load all the packages used in the examples.
# if this is file is to be added as a vignette DESCRIPTION will need # to be updated adding suggests and running of chunks here made # conditional on package availability library(ggrepel) library(ggpp) library(dplyr) library(readr) library(lubridate)
This section includes a few examples based mainly on issues requesting enhancements to the repulsive geometries from 'ggrepel'. The solutions presented here make use of position functions to provide the functionality requested without need to change the implementation of the repulsive geometries.
geom_dotplot
issue #207 raised by svenbioinfQ: When creating a plot with geom_dotplot()
, geom_text_repel()
does not draw the label arrows to the exact point location but rather to the center. (See points B1,B2,B3) Is there any way to do this?
df <- data.frame(name = c("A1","A2","B1","B2","B3"), value = c(1,2,2,2,2), group = c("A","A","B","B","B")) ggplot(df, aes(x = group,y = value,label = name)) + geom_dotplot(binaxis = 'y', stackdir = 'center', dotsize = 0.7) + geom_text_repel(aes(segment.color = "red"), box.padding = 1, max.overlaps = 10)
A1: As geom_dotplot()
does not have a stat
formal parameter and modifies the data, it is impossible to access or reproduce the positions of the points in another layer! As group
is a factor a hypothetical stat_dotplot()
would not easily work unless it uses a compute panel function. Alternatively, one would need to define a new geom geom_labelled_dotoplot
to do the labeling in the same layer as the plotting of the dots, but this would make difficult to apply different aesthetic mappings to data labels and points.
A2: A more general approach would be to use geom_point()
and geom_label_repel()
together with a newly defined position function. This new function, possibly named position_spread()
, would displace the observations' coordinates using the same or similar methods as geom_dotplot()
. This function does not yet exist, but would allow a simple solution within the grammar of graphics. Additional methods for spreading the points could make possible, for example, to create sunflower plots.
geom_segment()
issue #207 raised by ekatko1Q: Currently, the repel works to repel text away from single points. However, it would be great if it could repel text away from segments as well. This would be especially useful for automatically generating nice biplots for multivariate analysis (PCA, RDA, etc.).
## Example data frame where each species' principal components have been computed. df = data.frame( Species = paste("Species",1:5), PC1 = c(-4, -3.5, 1, 2, 3), PC2 = c(-1, -1, 0, -0.5, 0.7)) ## We plot the two primary principal componenets and add species labels ggplot(df, aes(x=PC1, y = PC2, label=Species)) + geom_segment(aes(x=0, y=0, xend=PC1, yend=PC2), arrow = arrow(length = unit(0.1, "inches"))) + geom_text_repel() + xlim(-5, 5) + ylim(-2, 2) + # Stadard settings for displaying biplots geom_hline(aes(yintercept = 0), size=.2) + geom_vline(aes(xintercept = 0), size=.2) + coord_fixed()
A: Repulsion from segments is probably not the best approach, or at least useful only in sparse data cases. In the case of biplots a good approach is to use nudging away from the origin. Package 'ggpp' provides position functions implementing computed nudging that can be different for each individual observation. Nudging moves the initial position from which labels are repulsed, making their use together with repulsion usually effective.
## We nudge the starting coordinates of the labels away from the origin ggplot(df, aes(x=PC1, y = PC2, label=Species)) + geom_segment(aes(x=0, y=0, xend=PC1, yend=PC2), arrow = arrow(length = unit(0.1, "inches"))) + geom_text_repel(position = position_nudge_centre(x = 0.2, y = 0.01, center_x = 0, center_y = 0)) + xlim(-5, 5) + ylim(-2, 2) + # Stadard settings for displaying biplots geom_hline(aes(yintercept = 0), size=.2) + geom_vline(aes(xintercept = 0), size=.2) + coord_fixed()
In a more crowded plot we may want to allow overlaps but improve the visibility of the text with a semitransparent fill using geom_label_repel()
.
## We nudge the labels manually ggplot(df, aes(x=PC1, y = PC2, label=Species)) + geom_hline(aes(yintercept = 0), size=.2) + geom_vline(aes(xintercept = 0), size=.2) + geom_segment(aes(x=0, y=0, xend=PC1, yend=PC2), arrow = arrow(length = unit(0.1, "inches"))) + geom_label_repel(position = position_nudge_center(x = 0.2, y = 0.01, center_x = 0, center_y = 0), label.size = NA, label.padding = 0.1, fill = rgb(red = 1, green = 1, blue = 1, alpha = 0.75)) + xlim(-5, 5) + ylim(-2, 2) + # Stadard settings for displaying biplots coord_fixed()
In a plot of sparse data we may try to dispense with repulsion and rely purely on nudging using geom_label_s()
from package 'ggpp'. For the current example this results in overlapping data labels.
## We nudge the labels manually ggplot(df, aes(x=PC1, y = PC2, label=Species)) + geom_hline(aes(yintercept = 0), size=.2) + geom_vline(aes(xintercept = 0), size=.2) + geom_segment(aes(x=0, y=0, xend=PC1, yend=PC2), arrow = arrow(length = unit(0.1, "inches"))) + geom_label_s(position = position_nudge_center(x = 0.2, y = 0.3, center_x = 0, center_y = 0), linewidth = 0, label.padding = unit(0.1, "lines"), fill = rgb(red = 1, green = 1, blue = 1, alpha = 0.75)) + xlim(-5, 5) + ylim(-2, 2) + # Stadard settings for displaying biplots coord_fixed()
scale_size_area()
issue #205 raised by sangwon-hyunQ: When using scale_size_area()
, point.size
needs to be adjusted, probably by a square root. It seems this should be the default behavior without the user specifying it.
A1: Rather than automating this, adding a parameter
called point.size.area
seems an approach consistent with 'ggplot2' and very easy to implement. Meanwhile, an equivalent manual approach is as follows.
size_from_area <- function(x) {sqrt(max(0, x) / pi)} df <- data.frame(b = exp(seq(2, 4, length.out = 10))) ggplot(df, aes(1, b, size = b)) + geom_point() + scale_size_area() + geom_text_repel(aes(label = round(b,2), point.size = size_from_area(b) * 2), box.padding = 0, min.segment.length = 0, direction = "x")
A2: In this case we can dispense with repulsion.
size_from_area <- function(x) {sqrt(max(0, x) / pi)} df <- data.frame(b = exp(seq(2, 4, length.out = 10))) ggplot(df, aes(1, b, size = b)) + geom_point() + scale_size_area() + geom_text_s(aes(label = round(b,2), point.padding = size_from_area(b) / 15), position = position_nudge_to(x = c(1.01, 0.99)), label.padding = 0, box.pading = 0) + xlim(0.9, 1.1)
Q: Related to issue #200 that I filed today. It would be neat to order the boxes in decreasing or ascending order from left to right, bottom to top. I.e. now the order depends on the seed but would it be possible to have a logical order?
A: Nudging can help. In this case applying nudging by specifying the nudged y position for each label. Here we add increasing amounts of nudging. It would be better to start afresh by groups of labels. We need to sort the row in the data according to the position along the x axis for this approach to work.
set.seed(42) mpg %>% group_by(model) %>% summarize(cty = mean(cty, na.rm = TRUE)) %>% arrange(cty) %>% ggplot(aes(x = cty, y = 1, label = model)) + geom_point(alpha = .3) + geom_label_repel( position = position_nudge_to(y = 2 + (1:length(unique(mpg$model)))/ 10), fill = "grey90", direction = "y", max.iter = 1e4, max.time = 1, size = 3, ## tiny for reprex lineheight = .9, box.padding = 0.1, label.padding = 0.1 ) + ylim(1, NA) + scale_x_continuous(expand = expansion(mult = c(0.1, 0.1))) + theme_void()
Q: When forcing geom_text\|label_repel() to draw vertical lines, one needs to use hjust = 0.5. Unfortunately, this argument applies justification to both the text and the textbox.
A: There is currently no solution available within 'ggrepel'.
set.seed(42) ## with left-aligned text (and thus no straight lines) mpg %>% group_by(manufacturer, model) %>% summarize(cty = mean(cty, na.rm = TRUE)) -> mpg.summaries ggplot(mpg.summaries, aes(x = cty, y = 1, label = paste0(manufacturer, "\n", model))) + geom_point(alpha = .3) + geom_label_repel( fill = "grey90", nudge_y = .03, direction = "y", hjust = 0, max.iter = 1e4, max.time = 1, size = 1, ## tiny for reprex lineheight = .9 ) + xlim(5, 30) + ylim(1, 0.95) + theme_void()
A: This would require separate parameters or a vector argument to set separately the justification for the text and for the anchor point of the segment. Should be easy to implement in geom_label_s()
if needed.
A: The original issue included a much more complex solution to this problem involving in addition formatting unrelated to 'ggrepel'. The simplification is achieved by using nudging.
keep <- c("Israel", "United States", "European Union", "China", "Russia", "Brazil", "World", "Indonesia", "Bangladesh") data <- readr::read_csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv", lazy = FALSE, show_col_types = FALSE) data %>% filter(location %in% keep) %>% select(location, date, total_vaccinations_per_hundred) %>% arrange(location, date) %>% filter(!is.na(total_vaccinations_per_hundred)) %>% mutate(location = factor(location), location = reorder(location, total_vaccinations_per_hundred)) %>% group_by(location) %>% # max(date) depends on the location! mutate(label = if_else(date == max(date), as.character(location), "")) -> owid ggplot(owid, aes(x = date, y = total_vaccinations_per_hundred, color = location)) + geom_line() + geom_text_repel(aes(label = label), size = 3, position = position_nudge_to(x = max(owid$date) + days(30)), segment.color = 'grey', point.size = 0, box.padding = 0.1, point.padding = 0.1, hjust = "left", direction="y") + scale_x_date(expand = expansion(mult = c(0.05, 0.2))) + labs(title = "Cumulative COVID-19 vaccination doses administered per 100 people", y = "", x = "Date (year-month)") + theme_bw() + theme(legend.position = "none")
Q: I was trying to achieve something similar as in this question: Is there a possibility to combine position_stack and nudge_x in a stacked bar chart in ggplot2?, but with ggrepel. I think I succeeded.
Having the following plot, I would like to move the names out of the bars and point to them with arrows.
A: position_stacknudge()
from package 'ggpp' solves this.
df <- tibble::tribble( ~y, ~x, ~grp, "a", 1, "some long name", "a", 2, "other name", "b", 1, "some name", "b", 3, "another name", "b", -1, "some long name" ) ggplot(data = df, aes(x, y, group = grp)) + geom_col(aes(fill = grp), width=0.5) + geom_vline(xintercept = 0) + geom_label_repel(aes(label = grp), position = position_stacknudge(vjust = 0.5, y = 0.4), label.size = NA)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.