knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(ggsankeyfier)
The ggsankeyfier
package needs data in a data.frame
in order to plot it. This data.frame
does
not necessarily has the same format as the data while you are processing it. In most cases data is
probably organised in a wide format with stages of the Sankey diagram in columns of the data.frame
.
For plotting this needs to be converted into a long format. Why and how to do this, is
discussed below.
A wide format would be typically used when working with
the data. This can be best understood when the framework you wish to visualise represents a
collection of cause-effect chains. In those cases each stage would represent a link in the chain.
So in a wide format each stage (or link in the cause-effect chain) would be represented by a column
in a data.frame
. Also, in the wide format, each row would represent a unique cause-effect chain.
The wide format is therefore suitable for describing such chains.
So why do we need the long format? This is because in a Sankey diagram, we want to visualise how information flows between stages. Moreover, we might even want to distinguish between the start and the end of such a flow. So, rather than the entire 'chain', the data format revolves around the flow ends. This requires a long format where each row contains information on a flow end.
Now, when do we work with either the wide or the long format? When working with information
on chains, it makes sense to work with a wide format. When plotting with ggsankeyfier
or
modification of flow information is required, a long format is more suitable.
This package comes with a function that allow you to pivot information with stages organised as columns (i.e., wide format) to a long format. All you need to do is specify which columns represent the stages, which numeric column quantifies the size of each flow and if there are specific aspects you wish to include as aesthetics in your plot.
This is illustrated with data from Piet et al. (submitted),
this data set describes risks
to provide ecosystem services via cause-effect chains (note that the package contains a
simplified, highly aggregated, version of the data). Where the services are affected by
activities that exert pressures onto the ecosystem components that supply those services.
Risk is expressed as a numeric indicator (RCSES).
Stages are formed by the columns named "activity_type"
, "pressure_cat"
, "biotic_group"
and
"service_division"
. This is pivoted from its wide format to the long format required for plotting
as follows:
## first get a data set with a wide format: data(ecosystem_services) ## pivot to long format for plotting: es_long <- pivot_stages_longer( ## the data.frame we wish to pivot: data = ecosystem_services, ## the columns that represent the stages: stages_from = c("activity_type", "pressure_cat", "biotic_realm", "service_division"), ## the column that represents the size of the flows: values_from = "RCSES" )
After pivoting to the long format as illustrated above you will note two additional columns
that contain information that was not available in the wide format. Namely the columns edge_id
and connector
.
names(es_long)
These columns are added as the ggsankeyfier
functions need them for plotting the data in a
Sankey diagram. More specifically, these columns are required to distinguish between the head
and tail of a flow. This allows to apply different aesthetics to both ends of a flow (not yet fully
implemented) and to visualise feedback loops (see vignette("loopdeloop")
),
which would otherwise not be possible. The
column connector
should contain either "from"
(start of a flow) or "to"
(end of a flow).
The edge_id
contains a unique identifier for each edge (flow), this determines which "from"
should be connected to which "to"
connector. When applying the pivot_stages_longer
function,
the "from"
end "to"
connector for each edge will have an identical value.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.