library(knitr) opts_chunk$set(fig.align = 'center', fig.show = 'hold', fig.width = 7, fig.height = 4) options(warnPartialMatchArgs = FALSE)
The functions described here are not expected to be useful in everyday plotting
as when using the grammar of graphics one can simply change the order in which
layers are added to a ggplot, or remove unused variables from the data before
passing it as argument to the ggplot()
constructor.
However, if one uses high level methods like autoplot()
or other functions
that automatically produce a full plot using 'ggplot2' internally, one may need
to add, move or delete layers so as to profit from such canned methods and
retain enough flexibility.
Some time ago I needed to manipulate the layers of a ggplot
, and found a
matching question in
Stackoverflow.
I used the answers found in Stackoverflow as the starting point for writing the
functions described in the first part of this vignette.
In a ggplot
object, layers reside in a list, and their positions in the list
determine the plotting order when generating the graphical output. The grammar
of graphics treats the list of layers as a stack using only push
operations. In other words, always the most recently added layer resides at the
end of the list, and during rendering over-plots all layers previously added.
The functions described in this vignette allow overriding the normal syntax
at the cost of breaking the expectations of the grammar. These functions are, as
told above, to be used only in exceptional cases. This notwithstanding, they are
rather easy to use and the user interface is consistent across all of them.
Moreover, they are designed to return objects that are identical to objects
created using the normal syntax rules of the grammar of graphics. The table
below list the names and purpose of these functions.
Function | Use
-------- | -------------------------------------
delete_layers()
| delete one or more layers
append_layers()
| append layers at a specific position
move_layers()
| move layers to an absolute position
shift_layers()
| move layers to a relative position
which_layers()
| obtain the index positions of layers
extract_layers()
| extract matched or indexed layers
num_layers()
| obtain number of layers
top_layer()
| obtain position of top layer
bottom_layer()
| obtain position of bottom layer
Although their definitions do not rely on code internal to 'ggplot2', they rely
on the internal structure of objects belonging to class gg
and ggplot
.
Consequently, long-term backwards and forward compatibility cannot be
guaranteed, or even expected.
library(ggplot2) library(gginnards) library(tibble) library(magrittr) library(stringr) eval_pryr <- requireNamespace("pryr", quietly = TRUE)
We generate some artificial data and create a data frame with them.
set.seed(4321) # generate artificial data my.data <- data.frame( group = factor(rep(letters[1:4], each = 30)), panel = factor(rep(LETTERS[1:2], each = 60)), y = rnorm(40), unused = "garbage" )
We add attributes to the data frame with the fake data.
attr(my.data, "my.atr.char") <- "my.atr.value" attr(my.data, "my.atr.num") <- 12345678
We change the default theme to an uncluttered one.
old_theme <- theme_set(theme_bw())
We generate a plot to be used later to demonstrate the use of the functions. We
ue expand_limits()
to ensure that the effect of later manipulations is easier
to notice.
p <- ggplot(my.data, aes(group, y)) + geom_point() + stat_summary(fun.data = mean_se, colour = "cornflowerblue", size = 1) + facet_wrap(~panel, scales = "free_x", labeller = label_both) + expand_limits(y = c(-2, 2)) p
To display summary textual information about a gg
object we use method
summary()
from package 'ggplot2', while methods print()
and plot()
will
display the actual plot.
summary(p)
Layers in a ggplot object are stored in a list as nameless members. This means that they have to be accessed using numerical indexes, and that we need to use some indirect way of finding the indexes corresponding to the layers of interest.
names(p$layers)
The output of summary()
is compact.
summary(p$layers)
The default print()
method for a list of layers displays only a small part of
the information in a layer.
print(p$layers)
To see all the fields, we need to use str()
, which we use here for a single
layer.
str(p$layers[[1]])
We start by using which_layers()
as it produces simply a vector of indexes
into the list of layers. The third statement is useless here, but demonstrates
how layers are selected in all the functions described in this document. We can
see that each layer, as described in the first volume of this User Guide,
contains one geometry and one statistic.
which_layers(p, "GeomPoint") which_layers(p, "StatIdentity") which_layers(p, "GeomPointrange") which_layers(p, "StatSummary") which_layers(p, idx = 1L)
We can also easily extract matching layers with extract_layers()
. Here one
layer is returned, and displayed using the default print()
method. Method
str()
can also be used as shown above.
extract_layers(p, "GeomPoint")
With delete_layers()
we can remove layers from a plot, selecting them using
the match to a class, as shown here, or by a positional index as shown next.
delete_layers(p, "GeomPoint")
delete_layers(p, idx = 1L)
delete_layers(p, "StatSummary")
With move_layers()
we can alter the stacking order of layers. The layers to
move are selected in the same way as in the examples above, while position
gives where to move the layers to. Two character strings, "top"
and "bottom"
are accepted as position
argument, as well as integer
s. In the later case,
the layer(s) is/are appended after the supplied position with reference to the
list of layers not being moved.
move_layers(p, "GeomPoint", position = "top")
The equivalent operation using a relative position. A positive value for shift
is interpreted as an upward displacement and a negative one as downwards
displacement.
shift_layers(p, "GeomPoint", shift = +1)
Here we show how to add a layer behind all other layers.
append_layers(p, geom_line(colour = "orange", size = 1), position = "bottom")
It is also possible to append the new layer immediately above an arbitrary
existing layer using a numeric index, which as shown here can be also obtained
by matching to a class name. In this example we insert a new layer in-between
two layers already present in the plot. As with the +
operator of the Grammar
of Graphics, object
also accepts a list of layers as argument (no example
shown).
append_layers(p, object = geom_line(colour = "orange", size = 1), position = which_layers(p, "GeomPoint"))
Annotations add layers, so they can be manipulated in the same way as other layers.
p1 <- p + annotate("text", label = "text label", x = 1.1, y = 0, hjust = 0) p1
delete_layers(p1, "GeomText")
Elements that are normally added to a ggplot with operator
+
, such as scales, themes, aesthetics can be replaced with the %+%
operator.
The situation with layers is different as a plot may contain multiple layers
and layers are nameless. With layers %+%
is not a replacement operator.
num_layers(p) num_layers(p %+% geom_point(colour = "blue")) num_layers(p + geom_point(colour = "blue"))
p1 <- p + theme_bw() p1 p1 + theme_void() p1 %+% theme_void()
Method summary()
is available for themes.
summary(theme_bw())
However, to see the actual values stored, we need to use str()
. To avoid
excessive output we first find the names for the elements of the theme and then
look as how the default text settings are stored.
names(theme_bw())
str(theme_bw()$text)
Themes can be modified using theme()
. See the 'ggplot2' documentation for
details.
The argument passed through data
to ggplot()
or a layer is stored in whole
in the ggplot
object, even the data columns not mapped to any aesthetic. In
most cases this does not matter, but in the case of huge datasets, the use of
RAM and disk space can add up, and occasionally printing of each plot can slow
down. The reason for storing the whole data set is that it is always possible to
add layers with the grammar of graphics to an existing plot and consequently
only the user can know which variables can be removed or not.
One obvious way of not storing unused data in ggplot
objects is for the user
to select the required variables and pass only these to the ggplot()
constructor or layers. A less efficient alternative, but possibly easier to use
for some users, is for users to drop the unused variables when they consider
that a plot is ready. We show here how to do this, with a function that started
as a self-imposed exercise.
To simplify the embedded data objects we need to find which variables are mapped to aesthetics and which are not. Here is a naive attempt at handling the possibility of mappings to expressions involving computations and multiple variables per mapping, and facets. This is naive in that it ignores mapping within layers and variables used for faceting.
mapped.vars <- gsub("[~*\\%^]", " ", as.character(p$mapping)) %>% str_split(boundary("word")) %>% unlist() %>% c(names(p$facet$params$facets))
We need also to find which variables are present in the data.
data.vars <- names(p$data)
Next we identify which variables in data
are not used, and delete them.
unused.vars <- setdiff(data.vars, c(mapped.vars)) keep.idxs <- which(!data.vars %in% unused.vars)
p1 <- p p1$data <- p$data[ , keep.idxs]
For a data set this small, removing a single column saves very little space.
object.size(my.data) object.size(p) object.size(p1)
names(my.data) names(p$data) names(p1$data)
The plot has not changed.
p1
We can assemble all the code into a function for convenience, and expand the
code to also recognize mappings within layers and variables used in faceting.
Such a function, only cursorily tested is included in the package as
drop_vars()
. Given its design the most likely failure mode is keeping too many
variables rather than removing too many.
drop_vars(p)
When saving ggplot
objects to disk avoiding to carry along unused
data can be beneficial. Of course, removing unused data means that they will not
be available at a later time if we want to add more layers to the same saved
ggplot object.
It was not clear to me when R does make a copy of the data embedded in a
ggplot
object and when not. R's policy is to copy data objects lazily, or
only when modified. Does the 'ggplot2' code modify the
argument passed to its data
parameter triggering a real copy operation or not.
We can check this with the help of package 'pryr'.
pryr::address(my.data) z <- p$data pryr::address(z)
In this case, R has not created a copy. So, from the point of view of total
memory usage, deleting the unused columns in p
is not always beneficial. If
the object is saved to disk or my.data
modified in any way after p
was
created a copy of my.data
will be created at this later time. In this simple
example we modify the value of an attribute.
attr(my.data, "my.atr.num") <- 1324567 pryr::address(z) pryr::address(my.data)
'ggplot2' version 3.1.0 and later preserves most attributes of the object passed
as argument to the data parameter of the ggplot()
constructor. The class of
the object seems to be modified if it is derived from data frame or tibble, but
other attributes are retained in the copy stored in the gg
object.
data_attributes(p)
Another interesting question is whether these user attributes are copied when
data are passed to geometries and statistics. We can find out with
geom_debug_panel()
that they are not.
p + geom_debug_panel(dbgfun.data = attributes, dbgfun.params = NULL)
The are many other things that we could explore about ggplot objects, but a package to be submitted to CRAN cannot have too many pages of documentation, so we hope this package and its documentation can serve as a starting point for further exploration.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.