knitr::opts_chunk$set(
    echo = TRUE,
    message = FALSE,
    collapse = TRUE,
    comment = "#>",
    fig.height = 5, 
    fig.width = 7
)

The quitte package is a grab-bag of utility functions to work with REMIND/IAM results (or any kind of data) in a .mif-like format. It builds on the capabilities of the dplyr, tidyr, and ggplot2 packages.

Load required packages

library(tidyverse)
library(quitte)

Generally, when working with quitte, you will need the dplyr and tidyr packages. tidyverse is a "collection" package loading these two, as well as the ggplot2 package and a couple of others.

Data Wrangling

As quitte builds on the capabilities and practices of the tidy data packages, it is highly recommended to read this excellent tutorial introducing dplyr and tidyr . You should be comfortable with the concept of piping (%>%), as well as the filter(), select(), group_by(), mutate(), and summarise() functions. Understanding the join functions (especially inner_join() and full_join() will help immensely).

Load Data

Note: We first load some example data for following along with the sections below. If anything is unclear, just keep reading, it will be explained further down.

read.quitte() reads .mif-files and converts them into five-dimensional long format data frames with columns model, scenario, region, variable/unit, period, and value.

First, select all the .mif files you want to load:

base.path <- 'some/directory/'

data.files <- c(
    'REMIND_generic_r7552c_1p5C_Def-rem-5.mif',
    'REMIND_generic_r7552c_1p5C_UBA_Sust-rem-5.mif',
    'REMIND_generic_r7552c_2C_Def-rem-5.mif',
    'REMIND_generic_r7552c_2C_UBA_Sustlife-rem-5.mif',
    'REMIND_generic_r7552c_REF_Def05-rem-5.mif',
    # 'REMIND_generic_r7552c_REF_Def05-rem-5_tab.mif',
    NULL)

Tip! Keep your data input formatted in a way that is easy to read and to modify. Separate the directory from the path if there are many files in the same directory. Having each path (or other item) on a separate line (and ending with NULL) makes it easy to comment out individual files during development or debugging.

Then load all the files:

data <- read.quitte(file.path(base.path, data.files))

, we can also load it like this: This is what you will do normally. Since the example data is included with the quitte package as a data object in case you just want to try something or give an example for something, we can also load it like this:

(data <- quitte_example_data)

Trim Data

If you know which data you will need you can trim down the data frame early, reducing memory needs and increasing processing speed.

inline.data.frame() is a wrapper function for entering tabular data in an easy-to-read fashion.

scenarios <- inline.data.frame(
    'scenario;                       scen.name',
    'r7552c_REF_Def05-rem-5;         Reference',
    # 'r7552c_1p5C_Def-rem-5;          1.5°C-def',
    'r7552c_2C_Def-rem-5;            2°C',
    'r7552c_1p5C_UBA_Sust-rem-5;     1.5°C-sust',
    # 'r7552c_2C_UBA_Sustlife-rem-5;   2°C-sust',
    NULL)

tmax <- 2100
data <- data %>% 
    filter(scenario %in% scenarios$scenario,
           tmax >= period)

Use Pretty Names

replace_column() can be used to replace a column with hard-to-read or ugly entries with more pleasant ones. This is quite useful for preparing data for plots or tables.

order.levels() arranges factor levels (e.g. scenario names) in a specific order, thus controlling the order with which items appear in plots.

(data <- data %>% 
    replace_column(scenarios, scenario, scen.name) %>% 
     order.levels(scenario = scenarios$scen.name))

Piping

The pipe operator %>% is a shortcut to string any number of operations together. So instead of writing a long code worm like

ungroup(mutate(group_by(select(filter(data, 'Reference' == scenario), region, period, value), region), value = cumsum(value)))

that you would have to read from the inside out, you can write each function call on a separate line, greatly improving readability as data "flows" from top to bottom, each function being applied in succession.

data %>% 
    filter('Reference' == scenario) %>% 
    select(region, period, value) %>% 
    group_by(region) %>% 
    mutate(value = cumsum(value)) %>% 
    ungroup()

Filtering

filter() is a more useful and versatile variant of the data frame extraction operator [. Instead of

data[data$period == 2030 & df$scenario != 'Reference',]

use

data %>% 
    filter(period == 2030,
           scenario != 'Reference')

again, improving readability and decreasing proneness to error. Predicates (the test) are combined via logical AND, you can use as many of them as you want, and there are lots of helper functions available in the dplyr package.

Selecting

select() selects columns from a data frame. So if you only need columns scenario and region, do

data %>% 
    select(scenario, region)

If you need all but model, do

data %>% 
    select(-model)

You can rename columns on the way

data %>% 
    select(scenario, year = period)

Grouping

group_by() subdivides a data frame into groups, such that so-called window functions operate only on the items within a group. If you, for example, want to calculate the cumulated energy production over time for different scenarios and regions, you need to group by those dimensions first.

data %>% 
    filter(variable == 'PE') %>% 
    group_by(scenario, region) %>% 
    mutate(value = cumsum(value)) %>% 
    ungroup()

Mutating

mutate() is the low-level function for modifying existing or creating new columns. Its output has the same number of rows as its input.

data %>% 
    filter(variable == 'PE') %>% 
    group_by(scenario, region) %>% 
    mutate(value = cumsum(value)) %>% 
    ungroup()

Summarising

summarise() is the low-level function for aggregating variables. Its output has one row for every group. All ungrouped columns are dropped.

data %>% 
    group_by(scenario) %>% 
    summarise(value = mean(value)) %>% 
    ungroup()

Sum Things Up

sum_total() sums values up across regions

data %>% 
    filter(scenario == 'Reference', 
           variable == 'Consumption', 
           period == 2050, 
           region %in% c('CHN', 'IND', 'JPN', 'OAS')) %>% 
    sum_total(group = region, name = 'all Asia')

or across other dimensions

data %>% 
    filter(scenario == '2°C', 
           region == 'RUS',
           period == 2035,
           variable %in% paste0('SE|Heat|', c('Coal', 'Gas'))) %>% 
    sum_total(variable, name = 'SE|Heat|fossil')

Add New Variables

calc_addVariable() calculates new variables from existing ones, using generic formulas. Variable names that are not valid R identifiers -- basically everything inside a .mif-file, need to be escaped using backticks ("`").

data %>% 
    filter(region == 'EUR',
           period == 2025,
           variable %in% c('PE', 'Population')) %>% 
    calc_addVariable(
        # EJ/yr / million / (3.6e-9 EJ/MWh) * 1e6 1/million = MWh/yr
        '`PE per capita`' = 'PE / Population / 3.6e-3',
        units = 'MWh/year')

Tip! Note down all unit conversions you do where you do them, and include the calculation. This will save you hours of debug-time in the future. (Trust me.)

It is also possible to only keep the newly calculated values for further manipulation, by settting only.new = TRUE.

data %>% 
    filter(region == 'OAS',
           period == 2070,
           scenario != 'Reference') %>% 
    calc_addVariable(
        # EJ/yr / $bn/yr * 1e-15 kJ/EJ * 1e9 $/$bn = kJ/$
        '`PE|Coal|GDP intensity`'       = '`PE|Coal`       / `GDP|PPP` * 1e6',
        '`PE|Oil|GDP intensity`'        = '`PE|Oil`        / `GDP|PPP` * 1e6',
        '`PE|Gas|GDP intensity`'        = '`PE|Gas`        / `GDP|PPP` * 1e6',
        '`PE|Nuclear|GDP intensity`'    = '`PE|Nuclear`    / `GDP|PPP` * 1e6',
        '`PE|Biomass|GDP intensity`'    = '`PE|Biomass`    / `GDP|PPP` * 1e6',
        '`PE|Hydro|GDP intensity`'      = '`PE|Hydro`      / `GDP|PPP` * 1e6',
        '`PE|Geothermal|GDP intensity`' = '`PE|Geothermal` / `GDP|PPP` * 1e6',
        '`PE|Wind|GDP intensity`'       = '`PE|Wind`       / `GDP|PPP` * 1e6',
        '`PE|Solar|GDP intensity`'      = '`PE|Solar`      / `GDP|PPP` * 1e6',
        units = 'kJ/$',
        only.new = TRUE)

Net Present Value of a Time Series

calcCumulatedDiscount() calculates the net present value of a time series.

data %>% 
    filter(scenario != 'Reference',
           region == 'World') %>% 
    calcCumulatedDiscount(nameVar = 'Policy Cost|Consumption Loss') %>% 
    select(-model, -region, -unit) %>% 
    pivot_wider(names_from = 'scenario')

Calculate Sample Quantiles

calc_quantiles() makes it easy to produce sample quantiles corresponding given probabilities from data frames.

(data.plot <- data %>% 
    filter(region != 'World',
           period == 2040) %>% 
    calc_addVariable(
        # MtCO2/$bn * 1e9 kg/Mt * 1e-9 $/$bn = kgCO2/$
        '`CO2 intensity`' = '`Emi|CO2` / `GDP|PPP`',
        units = 'kGCO2/US$2005',
        only.new = TRUE) %>% 
    group_by(model, scenario, variable, unit, period) %>% 
    calc_quantiles() %>% 
    pivot_wider(names_from = 'quantile'))

This is quite useful for plotting uncertainty ranges.

ggplot() +
    geom_col(
        data = data %>% 
            filter(region == 'World',
                   period == 2040) %>% 
            calc_addVariable(
                # MtCO2/$bn * 1e9 kg/Mt * 1e-9 $/$bn = kgCO2/$
                '`CO2 intensity`' = '`Emi|CO2` / `GDP|PPP`',
                units = 'kGCO2/US$2005',
                only.new = TRUE),
        mapping = aes(x = scenario, y = value, fill = scenario)) +
    scale_fill_discrete(guide = 'none') +
    geom_boxplot(
        data = data.plot,
        mapping = aes(
            x = scenario, 
            ymin = q0, lower = q25, middle = q50, upper = q75, ymax = q100),
        stat = 'identity',
        width = 0.5)

It can also be done for arbitrary quantiles

data %>% 
    filter(region == 'World',
           period == 2040) %>% 
    calc_addVariable(
        # MtCO2/$bn * 1e9 kg/Mt * 1e-9 $/$bn = kgCO2/$
        '`CO2 intensity`' = '`Emi|CO2` / `GDP|PPP`',
        units = 'kGCO2/US$2005',
        only.new = TRUE) %>% 
    group_by(model, scenario, variable, unit, period) %>% 
    calc_quantiles(probs = c(low = 1/3, high = 2/3, 'very high' = 4/5)) %>% 
    pivot_wider(names_from = 'quantile')

Interpolate Missing Periods

interpolate_missing_periods() adds missing periods to a data frame and interpolates missing values. It uses either linear or spline interpolation and can extend missing data before/after the first/last period in the original data.

data.example <- tribble(
    ~x,   ~value,
    2010,    0.1,
    2015,    0.7,
    2020,    0.9,
    2035,    0.6,
    2040,    0.3)

data.plot <- bind_rows(
    data.example %>% 
        interpolate_missing_periods(
            x = seq(2005, 2045, 1), expand.values = TRUE, 
            method = 'linear') %>% 
        rename(linear = value) %>% 
        gather(interpolation, value, -x),

    data.example %>% 
        interpolate_missing_periods(
            x = seq(2005, 2045, 1), expand.values = TRUE, method = 'spline') %>% 
        rename(spline = value) %>% 
        gather(interpolation, value, -x)
)

Be careful when using this, as the results may be nonsensical.

ggplot(mapping = aes(x = x, y = value)) +
    geom_line(data = data.plot, mapping = aes(colour = interpolation)) +
    geom_point(data = data.example) +
    coord_cartesian(ylim = c(0, 1))

In this case, values below 0 might be meaningless. So use spline extensions with care.

Plotting with mip

Data in quitte format is easily passed to the plot functions in the mip package.

data %>% 
    filter('World' == region,
           grepl('^PE\\|', variable)) %>% 
    mip::mipArea(x = .) +
    facet_wrap(~ scenario)
data %>% 
    filter(period %in% c(2010, 2030, 2050, 2100),
           grepl('^PE\\|', variable)) %>% 
    mip::mipBarYearData(x = .) +
    facet_wrap(~ region, scales = 'free_y')

Note: You don't have to call the mip-functions via the double colon (::) operator, but would load the package instead with library(mip). This is only needed in this vignette, since quitte can't import the mip package.)

Plotting 'correct' Bars

If bar plots are used naively (as in: without extra care), the bars for periods after 2055 are too narrow and either in the wrong places

colours_PE <- c('Coal'       = '#0C0C0C',
                'Oil'        = '#663A00',
                'Gas'        = '#E5E5B2',
                'Nuclear'    = '#FF33FF',
                'Hydro'      = '#191999',
                'Biomass'    = '#005900',
                'Geothermal' = '#E51900',
                'Wind'       = '#337FFF',
                'Solar'      = '#FFCC00')

data_plot <- quitte_example_data %>%
            filter('r7552c_1p5C_Def-rem-5' == scenario,
                   'EUR' == region,
                   grepl('^PE\\|', variable),
                   2100 >= period) %>%
            mutate(variable = sub('^PE\\|', '', variable)) %>%
            order.levels(variable = rev(names(colours_PE)))
ggplot() +
    geom_col(
        data = data_plot,
        mapping = aes(x = factor(period), y = value, fill = variable)) +
    scale_fill_manual(values = colours_PE,
                      name = NULL) +
    labs(x = NULL, y = 'EJ/yr')

or spread out with large gaps in between.

ggplot() +
    geom_col(
        data = data_plot,
        mapping = aes(x = period, y = value, fill = variable)) +
    scale_fill_manual(values = colours_PE,
                      name = NULL) +
    scale_x_continuous(breaks = unique(data_plot$period),
                       name = NULL) +
    labs(y = 'EJ/yr')

In both cases, the visual representation of the data is distorted, since the are of the bars should correspond to the integral of the variable over time. (3 EJ over 10 years should result in a bar with twice the area than 3 EJ over 5 years.)

To correct this, you can use the function add_remind_timesteps_columns, that adds two columns to the data frame, xpos and width, which can be used to create a 'correct' bar plot:

ggplot() +
    geom_col(
        data = data_plot %>%
            add_remind_timesteps_columns(gaps = 0.1),
        mapping = aes(x = xpos, width = width, y = value, fill = variable)) +
    scale_fill_manual(values = colours_PE,
                      name = NULL) +
    scale_x_continuous(breaks = unique(data_plot$period),
                       name = NULL) +
    labs(y = 'EJ/yr')

This is also available through the utility function ggplot_bar_remind_vts:

periods <- unique(data_plot$period)
periods <- periods[which(!periods %% 10)]

ggplot_bar_remind_vts(data_plot) +
    scale_fill_manual(values = colours_PE, name = NULL) +
    scale_x_continuous(breaks = periods, name = NULL) +
    labs(y = 'EJ/yr')


pik-piam/quitte documentation built on April 26, 2024, 12:58 a.m.