knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width=8, fig.height=4.94 ) old <- options() on.exit(options(old)) options(rmarkdown.html_vignette.check_title = FALSE) pkgs <- c("colorspace", "ggplot2", "ggthemes", "ggseqplot", "hrbrthemes", "patchwork", "purrr", "TraMineR") # Load all packages to library and adjust options lapply(pkgs, library, character.only = TRUE)
Following Fasang and Liao [-@fasang2014], we distinguish between sequence representation and summarization graphs. The latter aggregate and summarize the information stored in the sequence data without plotting actual observed sequences. Given the complexity of sequence data, these type of plots focus on one or two dimensions of information stored in sequence data [@brzinsky-fay2014]. Among the diverse members of the family of summarization graphs are sequence transitions plot, Kaplan-Meier survival curves, modal state plots, mean time plots, state distribution plots, and entropy plots [@fasang2014; @raab2022].
{ggseqplot}
includes five summarization graphs:
ggseqdplot
)ggseqplot
)ggseqmsplot
)ggseqmtplot
)ggseqtrplot
)Whereas summarization graphs aggregate the sequence data, representation
plots always display actually observed sequences. In the most basic form
of the traditional sequence index plot all observed sequences are
displayed. In data sets with several hundred cases this kind of
visualization, however, cannot be reasonably applied because of the
issue of overplotting. In such a scenario individual sequences are
partly plotted on top of each other and the resulting graph would be an
inaccurate representation of the underlying sequence data. In response
to this issue alternative representation plots which only render a
subset of the sequences have been suggested. {ggseqplot}
allows to
render both the traditional sequence index plots and representation
plots of subsets of sequences. Specifically, the library contains the
following plot types:
ggseqiplot
)ggseqfplot
)ggseqrplot
)ggseqrfplot
)For a more detailed discussion of sequence visualization I recommend the following articles/book chapters: Brzinsky-Fay [-@brzinsky-fay2014], Fasang and Liao [-@fasang2014] and Chapter 2 of Raab & Struffolino [-@raab2022].
With the exception of the transition rate plot all of the plots listed
above can be also produced with
{TraMineR}
{target="_blank"}. In
total, those two libraries provide a much more comprehensive set of
plots and often allow for more options than {ggseqplot}. Hence, if you
are an experienced user base R's plot
(which is used to render the
{TraMineR}
{target="_blank"} plots), there
is no real need to make yourself acquainted with {ggseqplot}
.
{ggseqplot}
was written because like many other R users I prefer
{ggplot2}
{target="_blank"} to base
R's plot
environment for visualizing data.
{TraMineR}
{target="_blank"}
[@gabadinho2011] was developed before
{ggplot2}
{target="_blank"}
[@wickham2016] was as popular as it is today and back then many users
were more familiar with coding base R plots. To date, however, many
researchers and students are more accustomed to using
{ggplot2}
{target="_blank"} and
prefer to draw on the related skills and experiences instead of learning
how to refine base R plots just for the single purpose of visualizing
sequence data.
This vignette outlines how sequence data generated with
TraMineR::seqdef
are reshaped to plot them as ggplot2-typed figures
using {ggseqplot}
. More specifically,
it gives an overview of the general procedure and depicts which
{TraMineR}
{target="_blank"} and {ggplot2}
{target="_blank"} functions are used
to render the plots
The vignette further illustrates how the appearance of plots produced with {ggseqplot}
can be changed using
{ggplot2}
{target="_blank"} functions
and extensions.
We start by loading the required libraries and defining the sequence
data to be plotted. We draw in the examples from the
{TraMineR}
{target="_blank"} for setting up
the examples.
\
Click to see code for installing and loading required packages
## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Load and download (if necessary) required packages ---- ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Save package names as a vector of strings pkgs <- c("colorspace", "ggplot2", "ggthemes", "hrbrthemes", "patchwork", "purrr", "TraMineR") ## Install uninstalled packages lapply(pkgs[!(pkgs %in% installed.packages())], install.packages, repos = getOption("repos")["CRAN"]) ## Load all packages to library and adjust options lapply(pkgs, library, character.only = TRUE) ## Don't forget to load ggseqplot library(ggseqplot)
## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Creating state sequence objects from example data sets ---- ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## biofam data data(biofam) biofam.lab <- c("Parent", "Left", "Married", "Left+Marr", "Child", "Left+Child", "Left+Marr+Child", "Divorced") biofam.seq <- seqdef(biofam[501:600, ], 10:25, # we only use a subsample labels = biofam.lab, weights = biofam$wp00tbgs[501:600]) ## actcal data data(actcal) actcal.lab <- c("> 37 hours", "19-36 hours", "1-18 hours", "no work") actcal.seq <- seqdef(actcal,13:24, labels=actcal.lab) ## ex1 data data(ex1) ex1.seq <- seqdef(ex1, 1:13, weights=ex1$weights)
Note that the default figure size in this document is specified as:
fig.width=8, fig.height=4.94
In general, all {ggseqplot}
functions operate in the similar way: they
extract the data to be plotted using a state sequence object generated
with TraMineR::seqdef
as a staring point. The functions either simply
use (a subset of) the sequence data stored in this object or call other
{TraMineR}
{target="_blank"} functions such as TraMineR::seqstatd
to obtain the
information to be plotted. Under the hood {ggseqplot}
reshapes those
data to visualize them using {ggplot2}
{target="_blank"} functions. Usually this means
that the data have to be reshaped into a long (tidy) format.
The following example illustrates the procedure for the case of a state distribution plot. The cross-sectional state distributions across the positions of the sequence data can be obtained by:
seqstatd(actcal.seq)
When calling ggseqdplot
these distributional data are reshaped into a
long data set in which every row stores the (weighted) relative
frequency of a given state at a given position along the sequence. The
example data actcal.seq
contain sequences of length 12 with an
alphabet comprising 4 states. The reshaped data serving as source for
the {ggplot2}
{target="_blank"} call thus
contain $12\times4=48$ rows. If a group vector is specified, the respective data
will comprise 48 rows for each group. The data set produced by ggseqdplot
can
be accessed if the function's output is assigned to an object. The resulting
list object stores the data as its first element (named data
).
dplot <- ggseqdplot(actcal.seq) dplot$data
Once the data are in the right shape {ggseqplot}
functions produce
graphs using {ggplot2}
{target="_blank"} functions. In the case of the state
distribution plot, for instance, ggseqdplot
renders stacked bar charts
for each sequence position using ggplot2::geom_bar
.
The following table gives an overview of the most important internal
function calls used to render different plot types with {ggseqplot}
| ggseqplot function | TraMineR function | ggplot2 geoms and extensions |
|---------------|----------------|-----------------------------------------|
| ggseqdplot
| TraMineR::seqstatd
| ggplot2::geom_bar
optional: geom_line
|
| ggseqeplot
| TraMineR::seqstatd
| ggplot2::geom_line
|
| ggseqmsplot
| TraMineR::seqmodst
| ggplot2::geom_bar
|
| ggseqmtplot
| TraMineR::seqmeant
| ggplot2::geom_bar
|
| ggseqtrplot
| TraMineR::seqtrate
| ggplot2::geom_tile
|
| ggseqiplot
| TraMineR::seqformat
| ggplot2::geom_rect
|
| ggseqfplot
| TraMineR::seqtab
| ggplot2::geom_rect
{ggh4x}
{target="_blank"} (for the axis labeling if group has been specified) |
| ggseqrplot
| TraMineR::seqrep
| ggplot2::geom_rect
ggrepel::geom_text_repel
{ggtext}
{target="_blank"} (for optional colored axis labels){patchwork}
{target="_blank"} (to combine plots) |
| ggseqrfplot
| TraMineR::seqrfplot
| ggplot2::geom_rect
ggplot2::geom_boxplot
{patchwork}
{target="_blank"} (to combine plots) |
The appearance of most of the plots generated with {ggseqplot}
can be adjusted just like every other ggplot (e.g., by changing the theme
or the scale using +
and the respective functions). Representative sequence
plots and relative frequency sequence plots, however, behave differently because
they are composed of multiple plots which are arranged by the
{patchwork}
{target="_blank"}
library. The following section illustrates how the appearance of the plots can
be changed
As mentioned above, most plots rendered by
{ggseqplot}
are of class
c("gg", "ggplot")
and can be adjusted just like other plots rendered with
{ggplot2}
{target="_blank"}
In our first example we illustrate this for state distribution plot.
We start with the most basic version of the plot visualizing the
state distributions of actcal.seq
without changing any of the
defaults.
# ggseqplot::ggseqdplot ggseqdplot(actcal.seq)
We proceed by illustrating how
{ggplot2}
{target="_blank"} functions
& extensions can be used to refine the default outcome. Just like every
other {ggplot2}
{target="_blank"}
figure the appearance of plots generated with
{ggseqplot}
functions
can be dramatically changed with a few adjustments:
ggseqdplot(actcal.seq) + scale_fill_discrete_sequential("heat") + scale_x_discrete(labels = month.abb) + labs(title = "State distribution plot", x = "Month") + guides(fill=guide_legend(title="Alphabet")) + theme_ipsum(base_family = "") + # ensures that this works on different OS theme(plot.title = element_text(size = 30, margin=margin(0,0,20,0)), plot.title.position = "plot")
In the following example we again illustrate a few
{ggplot2}
{target="_blank"} functions
& extensions by composing a figure comprising two plots produced with
ggseqdplot
. Both visualize the same data but only the first plot
considers weights. In addition to state distributions the plots display
the accompanying entropies as line plot (geom_line
). Finally, the
plots are brought together using the
{patchwork}
{target="_blank"}
library [@pedersen2020].
# Save plot using weights p1 <- ggseqdplot(ex1.seq, with.entropy = TRUE) + ggtitle("Weighted data") # Save same plot without using weights p2 <- ggseqdplot(ex1.seq, with.entropy = TRUE, weighted = FALSE) + ggtitle("Unweighted data") # Arrange and refine plots using patchwork p1 + p2 + plot_layout(guides = "collect") & scale_fill_manual(values= canva_palettes$`Fun and tropical`[1:4]) & theme_ipsum(base_family = "") & theme(plot.title = element_text(size = 20, hjust = 0.5), legend.position = "bottom", legend.title = element_blank())
The second set of examples illustrates how to refine a figure of combined
transition rate plots. ggseqtrplot
calls TraMineR::seqtrate
to obtain
the transition rates between the states of the alphabet. TraMineR::seqtrate
stores these rates in a symmetrical matrix which internally is reshaped into a
long format with one row for every combination of states (i.e., the squared
size of the sequence alphabet) by ggseqdplot
. The reshaped data are
the input for a {ggplot2}
{target="_blank"}
call using geom_tile
.
We start with a simple example that only takes the sequence data and the group argument as inputs. The output is a faceted plot visualizing two transition rate matrices of DSS sequence data.
ggseqtrplot(actcal.seq, group = actcal$sex)
In the second example we specify additional arguments and utilize once
again the {patchwork}
{target="_blank"}
library to compose a figure that compares the
transition matrices of sequence stored in the STS and the DSS format.
We use x_n.dodge = 2
to prevent overlapping of the state labels of the
x-axis, slightly reduce the labels size of the value labels displayed
within the tiles, and use dss = FALSE
to compute and display the
transition rates of the STS sequences.
p1 <- ggseqtrplot(biofam.seq, dss = FALSE, x_n.dodge = 2, labsize = 3) + ggtitle("STS Sequences") + theme(plot.margin = unit(c(5,10,5,5), "points")) p2 <- ggseqtrplot(biofam.seq, x_n.dodge = 2, labsize = 3) + ggtitle("DSS Sequences") + theme(plot.margin = unit(c(5,5,5,10), "points")) p1 + p2 & theme(plot.title = element_text(size = 20, hjust = 0.5))
Other than the grouped version of the plot this composed figure contains
the y-axis title and labels twice. This can be changed with small
adjustments of the corresponding theme
arguments.
p2 <- p2 + theme(axis.text.y = element_blank(), axis.title.y = element_blank()) p1 + p2 & theme(plot.title = element_text(size = 20, hjust = 0.5))
We conclude this section by illustrating that it is also possible to flip
the coordinates of the plots rendered by
{ggseqplot}
, a procedure that is
widely used in the {ggplot2}
{target="_blank"}
universe (although the coordinates could also be swapped in the aes(x, y, ...)
specification).
In the example below we illustrate the procedure for a mean time plot and a sequence index plot. We always present both, the default plot and the flipped version:
## default plot ggseqmtplot(actcal.seq, no.n = TRUE, error.bar = "SE") ## flipped version ggseqmtplot(actcal.seq, no.n = TRUE, error.bar = "SE") + coord_flip() + theme(axis.text.y=element_blank(), axis.ticks.y = element_blank(), panel.grid.major.y = element_blank(), legend.position = "top")
While in the example above the flipped plot might be in greater accordance to most people's aesthetic preferences, flipping the coordinates in the case of sequence index plots might be a more of an opinionated design choice. Most scholars prefer to display time on the horizontal axis. However, if you favor time to run from the bottom to the top (like in Piccarreta and Lior [-@piccarreta2010]) instead of left to right, your preferences can be easily met.
## default plot ggseqiplot(actcal.seq, sortv = "from.end") + scale_x_discrete(labels = month.abb) ## flipped version ggseqiplot(actcal.seq, sortv = "from.end") + scale_x_discrete(labels = month.abb) + coord_flip()
Two types of plots differ from the other
{ggseqplot}
functions because they
are composed by two subplots which are arranged to a joint figure with the
{patchwork}
{target="_blank"}
library. The output of those functions cannot be changed in the same
was as for the other functions. For details on the
{patchwork}
{target="_blank"}
library we recommend the package's website.
Some of the adjustments of a combined
{patchwork}
{target="_blank"}
plot are pretty similar to the
default {ggplot2}
{target="_blank"} procedure.
In the example below we change the theme and add a
title to the plot. Note that the corresponding functions are not added by
+
but with &
instead.
## compute dissimilarity matrix required for plot diss <- seqdist(biofam.seq, method = "LCS") ## Relative Frequency Sequence Plot ## default version ggseqrfplot(biofam.seq, diss = diss, k = 12) ## adjusted version ggseqrfplot(biofam.seq, diss = diss, k = 12) & theme_ipsum(base_family = "") & theme(legend.position = "bottom", legend.title = element_blank(), plot.title = element_text(size = 12)) & plot_annotation(title = "Relative Frequency Sequence Plot")
If you want to manipulate the appearance of a specific plot, however, your default
code might not work. If you want to change the labels of the index plot, for instance,
the following code will not produce the desired result, because scale_x_discrete
will
change the appearance of the boxplot, i.e. the last plot used when composing the plot
with {patchwork}
{target="_blank"}.
ggseqrfplot(biofam.seq, diss = diss, k = 12) + scale_x_discrete(labels = 15:30)
The appearance of the subplots, however, can be changed once you save the composite plot and then adjust its components.
## save & view original plot p <- ggseqrfplot(biofam.seq, diss = diss, k = 12) p ## change appearance of sub-plots ## first component: index plot p[[1]] <- p[[1]] + scale_x_discrete(labels = 15:30) ## second component: boxplot p[[2]] <- p[[2]] + labs(title = "Changed title") ## adjusted plot p
Note that things become a little bit more complex in the case of a grouped representative sequence plot. In such a plot each group contributes two subplots. One providing information on the "quality" of the representative sequences, and another one containing the corresponding index plots. If we want to change the x-axis labels of the following plot, we have to extract and change the index plots which appear in the second row of the combined plot. The plots are arranged by row. Hence the index plots are the subplots 3 and 4
## Compute a pairwise dissimilarity matrix diss <- seqdist(actcal.seq, method = "LCS") ## original plot p <- ggseqrplot(actcal.seq, diss = diss, nrep = 3, group = actcal$sex) p ## adjusted sequence index subplots p[[3]] <- p[[3]] + scale_x_discrete(labels = month.abb) p[[4]] <- p[[4]] + scale_x_discrete(labels = month.abb) p
Grouped rfplots are currently not implemented for ggseqrfplot
and have to be
created manually using the {patchwork}
{target="_blank"}
library.
In the following example we
{patchwork}
{target="_blank"}'s tag annotation assigning group-specific tags
for the first plot in each row of the final patchwork plot,p$man
),Technically speaking the resulting plot is a nested patchwork plot. According to the documentation
of {patchwork}
{target="_blank"}
[i]t is important to note that plot annotations only have an effect on the top-level patchwork. Any annotation added to nested patchworks are (currently) lost. If you need to have annotations for a nested patchwork you’ll need to wrap it in wrap_elements() with the side-effect that alignment no longer works.^[https://patchwork.data-imaginist.com/articles/guides/annotation.html#titles-subtitles-and-captions]
For this reason the group-specific titles were not added with patchwork::plot_annotation
or ggplot2::ggtitle
and we reverted to the use of tags instead.
diss <- seqdist(biofam.seq, method = "LCS") sex <- biofam[501:600, "sex"] p <- map2( levels(sex), # input x c("Men", "Women"), # input y function(x, y) { p <- ggseqrfplot(biofam.seq[sex == x,], diss = diss[sex == x,sex == x], k = 12) p[[1]] <- p[[1]] + labs(tag = y) return(p) } ) names(p) <- levels(sex) (p$man & theme(legend.position = "none")) / p$woman
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.