README.md

Travis-CI Build Status

gghighlight

Highlight lines and points in ggplot2.

Installation

install.packages("dplyr")

# Or the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/dplyr")

Example

Suppose the data has a lot of series.

library(dplyr, warn.conflicts = FALSE)

set.seed(1)
d <- tibble(
  idx = 1:10000,
  value = runif(idx, -1, 1),
  type = sample(letters, size = length(idx), replace = TRUE)
) %>%
  group_by(type) %>%
  mutate(value = cumsum(value)) %>%
  ungroup()

It is difficult to distinguish them by colour.

library(ggplot2)

ggplot(d) +
  geom_line(aes(idx, value, colour = type))

So we are motivated to highlight only important series, like this:

library(gghighlight)

gghighlight_line(d, aes(idx, value, colour = type), max(value) > 20)

As gghighlight_*() returns a ggplot object, it is customizable just as we usually do with ggplot2. (Note that, while gghighlights doesn't require ggplot2 loaded, ggplot2 need to be loaded to customize the plot)

gghighlight_line(d, aes(idx, value, colour = type), max(value) > 20) +
  theme_minimal()

The plot also can be facetted:

gghighlight_line(d, aes(idx, value, colour = type), max(value) > 20) +
  facet_wrap(~ type)

Supported geoms

Line

library(gghighlight)

gghighlight_line(d, aes(idx, value, colour = type), max(value) > 20)

Point

set.seed(10)
d2 <- sample_n(d, 20)

gghighlight_point(d2, aes(idx, value), value > 0)
#> Warning in gghighlight_point(d2, aes(idx, value), value > 0): Using type as
#> label for now, but please provide the label_key explicity!

Grouped vs ungrouped

You may notice that the gghighlight_line() and gghighlight_point() has different semantics.

By default, gghighlight_line() calculates predicate per group, more precisely, dplyr::group_by() + dplyr::summarise(). So if the predicate expression returns more than one value per group, it ends up with an error like this:

gghighlight_line(d, aes(idx, value, colour = type), value > 20)
#> Error in summarise_impl(.data, dots): Column `predicate..........` must be length 1 (a summary value), not 387

On the other hand, gghighlight_point() calculates predicate per row by default. This behaviour can be controled via use_group_by argument like this:

gghighlight_point(d2, aes(idx, value, colour = type), max(value) > 0, use_group_by = TRUE)
#> Warning in gghighlight_point(d2, aes(idx, value, colour = type), max(value)
#> > : Using type as label for now, but please provide the label_key
#> explicity!

While gghighlight_line() also has use_group_by argument, I don't think ungrouped lines can be interesting because data that can be represented as line must have its series, or groups.

Non-logical predicate

To construct a predicate expression like bellow, we need to determine a threshold (in this example, 20). But it is difficult to choose a nice one before we draw plots.

max(value) > 20

So, gghighlight_*() allows predicates that return numeric (or character) results. The values are used for sorting data and the top max_highlight of rows/groups are highlighted:

gghighlight_line(d, aes(idx, value, colour = type), max(value), max_highlight = 5L)



Try the gghighlight package in your browser

Any scripts or data that you put into this service are public.

gghighlight documentation built on Oct. 5, 2017, 5:03 p.m.