knitr::opts_chunk$set( echo = TRUE, message = FALSE, collapse = TRUE, comment = "#>", fig.height = 5, fig.width = 7 )
The quitte
package is a grab-bag of utility functions to work with REMIND/IAM
results (or any kind of data) in a .mif
-like format. It builds on the
capabilities of the dplyr
, tidyr
, and ggplot2
packages.
library(tidyverse) library(quitte)
Generally, when working with quitte, you will need the dplyr
and tidyr
packages. tidyverse
is a "collection" package loading these two, as well as
the ggplot2
package and a couple of others.
As quitte
builds on the capabilities and practices of the tidy data
packages, it is highly recommended to read this excellent tutorial introducing
dplyr
and tidyr
. You
should be comfortable with the concept of piping (%>%
), as well as the
filter()
, select()
, group_by()
, mutate()
, and summarise()
functions.
Understanding the join functions (especially inner_join()
and full_join()
will help immensely).
Note: We first load some example data for following along with the sections below. If anything is unclear, just keep reading, it will be explained further down.
read.quitte()
reads .mif
-files and converts them into five-dimensional long
format data frames with columns model
, scenario
, region
,
variable
/unit
, period
, and value
.
First, select all the .mif
files you want to load:
base.path <- 'some/directory/' data.files <- c( 'REMIND_generic_r7552c_1p5C_Def-rem-5.mif', 'REMIND_generic_r7552c_1p5C_UBA_Sust-rem-5.mif', 'REMIND_generic_r7552c_2C_Def-rem-5.mif', 'REMIND_generic_r7552c_2C_UBA_Sustlife-rem-5.mif', 'REMIND_generic_r7552c_REF_Def05-rem-5.mif', # 'REMIND_generic_r7552c_REF_Def05-rem-5_tab.mif', NULL)
Tip! Keep your data input formatted in a way that is easy to read and to modify. Separate the directory from the path if there are many files in the same directory. Having each path (or other item) on a separate line (and ending with
NULL
) makes it easy to comment out individual files during development or debugging.
Then load all the files:
data <- read.quitte(file.path(base.path, data.files))
, we can also load it like this:
This is what you will do normally. Since the example data is included with the
quitte
package as a data object in case you just want to try something or give
an example for something, we can also load it like this:
(data <- quitte_example_data)
If you know which data you will need you can trim down the data frame early, reducing memory needs and increasing processing speed.
inline.data.frame()
is a wrapper function for entering tabular data in an
easy-to-read fashion.
scenarios <- inline.data.frame( 'scenario; scen.name', 'r7552c_REF_Def05-rem-5; Reference', # 'r7552c_1p5C_Def-rem-5; 1.5°C-def', 'r7552c_2C_Def-rem-5; 2°C', 'r7552c_1p5C_UBA_Sust-rem-5; 1.5°C-sust', # 'r7552c_2C_UBA_Sustlife-rem-5; 2°C-sust', NULL) tmax <- 2100
data <- data %>% filter(scenario %in% scenarios$scenario, tmax >= period)
replace_column()
can be used to replace a column with hard-to-read or ugly
entries with more pleasant ones. This is quite useful for preparing data for
plots or tables.
order.levels()
arranges factor levels (e.g. scenario names) in a specific
order, thus controlling the order with which items appear in plots.
(data <- data %>% replace_column(scenarios, scenario, scen.name) %>% order.levels(scenario = scenarios$scen.name))
The pipe operator %>%
is a shortcut to string any number of operations
together. So instead of writing a long code worm like
ungroup(mutate(group_by(select(filter(data, 'Reference' == scenario), region, period, value), region), value = cumsum(value)))
that you would have to read from the inside out, you can write each function call on a separate line, greatly improving readability as data "flows" from top to bottom, each function being applied in succession.
data %>% filter('Reference' == scenario) %>% select(region, period, value) %>% group_by(region) %>% mutate(value = cumsum(value)) %>% ungroup()
filter()
is a more useful and versatile variant of the data frame extraction
operator [
. Instead of
data[data$period == 2030 & df$scenario != 'Reference',]
use
data %>% filter(period == 2030, scenario != 'Reference')
again, improving readability and decreasing proneness to error. Predicates (the
test) are combined via logical AND, you can use as many of them as you want,
and there are lots of helper functions available in the dplyr
package.
select()
selects columns from a data frame. So if you only need columns
scenario
and region
, do
data %>% select(scenario, region)
If you need all but model
, do
data %>% select(-model)
You can rename columns on the way
data %>% select(scenario, year = period)
group_by()
subdivides a data frame into groups, such that so-called window
functions operate only on the items within a group. If you, for example, want
to calculate the cumulated energy production over time for different
scenarios and regions, you need to group by those dimensions first.
data %>% filter(variable == 'PE') %>% group_by(scenario, region) %>% mutate(value = cumsum(value)) %>% ungroup()
mutate()
is the low-level function for modifying existing or creating new
columns. Its output has the same number of rows as its input.
data %>% filter(variable == 'PE') %>% group_by(scenario, region) %>% mutate(value = cumsum(value)) %>% ungroup()
summarise()
is the low-level function for aggregating variables. Its output
has one row for every group. All ungrouped columns are dropped.
data %>% group_by(scenario) %>% summarise(value = mean(value)) %>% ungroup()
sum_total()
sums values up across regions
data %>% filter(scenario == 'Reference', variable == 'Consumption', period == 2050, region %in% c('CHN', 'IND', 'JPN', 'OAS')) %>% sum_total(group = region, name = 'all Asia')
or across other dimensions
data %>% filter(scenario == '2°C', region == 'RUS', period == 2035, variable %in% paste0('SE|Heat|', c('Coal', 'Gas'))) %>% sum_total(variable, name = 'SE|Heat|fossil')
calc_addVariable()
calculates new variables from existing ones, using generic
formulas. Variable names that are not valid R identifiers -- basically
everything inside a .mif
-file, need to be escaped using backticks ("`").
data %>% filter(region == 'EUR', period == 2025, variable %in% c('PE', 'Population')) %>% calc_addVariable( # EJ/yr / million / (3.6e-9 EJ/MWh) * 1e6 1/million = MWh/yr '`PE per capita`' = 'PE / Population / 3.6e-3', units = 'MWh/year')
Tip! Note down all unit conversions you do where you do them, and include the calculation. This will save you hours of debug-time in the future. (Trust me.)
It is also possible to only keep the newly calculated values for further
manipulation, by settting only.new = TRUE
.
data %>% filter(region == 'OAS', period == 2070, scenario != 'Reference') %>% calc_addVariable( # EJ/yr / $bn/yr * 1e-15 kJ/EJ * 1e9 $/$bn = kJ/$ '`PE|Coal|GDP intensity`' = '`PE|Coal` / `GDP|PPP` * 1e6', '`PE|Oil|GDP intensity`' = '`PE|Oil` / `GDP|PPP` * 1e6', '`PE|Gas|GDP intensity`' = '`PE|Gas` / `GDP|PPP` * 1e6', '`PE|Nuclear|GDP intensity`' = '`PE|Nuclear` / `GDP|PPP` * 1e6', '`PE|Biomass|GDP intensity`' = '`PE|Biomass` / `GDP|PPP` * 1e6', '`PE|Hydro|GDP intensity`' = '`PE|Hydro` / `GDP|PPP` * 1e6', '`PE|Geothermal|GDP intensity`' = '`PE|Geothermal` / `GDP|PPP` * 1e6', '`PE|Wind|GDP intensity`' = '`PE|Wind` / `GDP|PPP` * 1e6', '`PE|Solar|GDP intensity`' = '`PE|Solar` / `GDP|PPP` * 1e6', units = 'kJ/$', only.new = TRUE)
calc_growthrate()
calculates the growth rate (in %/yr
) of a time series,
appending [Growth Rate]
to the variable name:
data %>% calc_growthrate(only.new = TRUE, filter.function = "Consumption")
returns a data.frame with the variable Consumption [Growth Rate]
.
filter.function
can be a vector of variable names or a function that filters data
.
calcCumulatedDiscount()
calculates the net present value of a time series.
data %>% filter(scenario != 'Reference', region == 'World') %>% calcCumulatedDiscount(nameVar = 'Policy Cost|Consumption Loss') %>% select(-model, -region, -unit) %>% pivot_wider(names_from = 'scenario')
calc_quantiles()
makes it easy to produce sample quantiles corresponding
given probabilities from data frames.
(data.plot <- data %>% filter(region != 'World', period == 2040) %>% calc_addVariable( # MtCO2/$bn * 1e9 kg/Mt * 1e-9 $/$bn = kgCO2/$ '`CO2 intensity`' = '`Emi|CO2` / `GDP|PPP`', units = 'kGCO2/US$2005', only.new = TRUE) %>% group_by(model, scenario, variable, unit, period) %>% calc_quantiles() %>% pivot_wider(names_from = 'quantile'))
This is quite useful for plotting uncertainty ranges.
ggplot() + geom_col( data = data %>% filter(region == 'World', period == 2040) %>% calc_addVariable( # MtCO2/$bn * 1e9 kg/Mt * 1e-9 $/$bn = kgCO2/$ '`CO2 intensity`' = '`Emi|CO2` / `GDP|PPP`', units = 'kGCO2/US$2005', only.new = TRUE), mapping = aes(x = scenario, y = value, fill = scenario)) + scale_fill_discrete(guide = 'none') + geom_boxplot( data = data.plot, mapping = aes( x = scenario, ymin = q0, lower = q25, middle = q50, upper = q75, ymax = q100), stat = 'identity', width = 0.5)
It can also be done for arbitrary quantiles
data %>% filter(region == 'World', period == 2040) %>% calc_addVariable( # MtCO2/$bn * 1e9 kg/Mt * 1e-9 $/$bn = kgCO2/$ '`CO2 intensity`' = '`Emi|CO2` / `GDP|PPP`', units = 'kGCO2/US$2005', only.new = TRUE) %>% group_by(model, scenario, variable, unit, period) %>% calc_quantiles(probs = c(low = 1/3, high = 2/3, 'very high' = 4/5)) %>% pivot_wider(names_from = 'quantile')
interpolate_missing_periods()
adds missing periods to a data frame and
interpolates missing values. It uses either linear or spline interpolation
and can extend missing data before/after the first/last period in the original
data.
data.example <- tribble( ~x, ~value, 2010, 0.1, 2015, 0.7, 2020, 0.9, 2035, 0.6, 2040, 0.3) data.plot <- bind_rows( data.example %>% interpolate_missing_periods( x = seq(2005, 2045, 1), expand.values = TRUE, method = 'linear') %>% rename(linear = value) %>% gather(interpolation, value, -x), data.example %>% interpolate_missing_periods( x = seq(2005, 2045, 1), expand.values = TRUE, method = 'spline') %>% rename(spline = value) %>% gather(interpolation, value, -x) )
Be careful when using this, as the results may be nonsensical.
ggplot(mapping = aes(x = x, y = value)) + geom_line(data = data.plot, mapping = aes(colour = interpolation)) + geom_point(data = data.example) + coord_cartesian(ylim = c(0, 1))
In this case, values below 0 might be meaningless. So use spline extensions with care.
mip
Data in quitte
format is easily passed to the plot functions in the mip
package.
data %>% filter('World' == region, grepl('^PE\\|', variable)) %>% mip::mipArea(x = .) + facet_wrap(~ scenario)
data %>% filter(period %in% c(2010, 2030, 2050, 2100), grepl('^PE\\|', variable)) %>% mip::mipBarYearData(x = .) + facet_wrap(~ region, scales = 'free_y')
Note: You don't have to call the
mip
-functions via the double colon (::
) operator, but would load the package instead withlibrary(mip)
. This is only needed in this vignette, sincequitte
can't import themip
package.)
If bar plots are used naively (as in: without extra care), the bars for periods after 2055 are too narrow and either in the wrong places
colours_PE <- c('Coal' = '#0C0C0C', 'Oil' = '#663A00', 'Gas' = '#E5E5B2', 'Nuclear' = '#FF33FF', 'Hydro' = '#191999', 'Biomass' = '#005900', 'Geothermal' = '#E51900', 'Wind' = '#337FFF', 'Solar' = '#FFCC00') data_plot <- quitte_example_data %>% filter('r7552c_1p5C_Def-rem-5' == scenario, 'EUR' == region, grepl('^PE\\|', variable), 2100 >= period) %>% mutate(variable = sub('^PE\\|', '', variable)) %>% order.levels(variable = rev(names(colours_PE)))
ggplot() + geom_col( data = data_plot, mapping = aes(x = factor(period), y = value, fill = variable)) + scale_fill_manual(values = colours_PE, name = NULL) + labs(x = NULL, y = 'EJ/yr')
or spread out with large gaps in between.
ggplot() + geom_col( data = data_plot, mapping = aes(x = period, y = value, fill = variable)) + scale_fill_manual(values = colours_PE, name = NULL) + scale_x_continuous(breaks = unique(data_plot$period), name = NULL) + labs(y = 'EJ/yr')
In both cases, the visual representation of the data is distorted, since the are of the bars should correspond to the integral of the variable over time. (3 EJ over 10 years should result in a bar with twice the area than 3 EJ over 5 years.)
To correct this, you can use the function add_remind_timesteps_columns
, that
adds two columns to the data frame, xpos
and width
, which can be used to
create a 'correct' bar plot:
ggplot() + geom_col( data = data_plot %>% add_remind_timesteps_columns(gaps = 0.1), mapping = aes(x = xpos, width = width, y = value, fill = variable)) + scale_fill_manual(values = colours_PE, name = NULL) + scale_x_continuous(breaks = unique(data_plot$period), name = NULL) + labs(y = 'EJ/yr')
This is also available through the utility function ggplot_bar_remind_vts
:
periods <- unique(data_plot$period) periods <- periods[which(!periods %% 10)] ggplot_bar_remind_vts(data_plot) + scale_fill_manual(values = colours_PE, name = NULL) + scale_x_continuous(breaks = periods, name = NULL) + labs(y = 'EJ/yr')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.