knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(dplyr)
library(IEATools)

Introduction

IEATools is an R package that provides functions to analyze extended energy balance data from the International Energy Agency (IEA). The IEA's data are not free. But if you have access to the data, you can follow the instructions in this vignette to analyze it.

Getting started

To get started, data must be obtained and pulled into an R data frame.

Obtain IEA data

Purchase the IEA extended energy balances product from (https://data.iea.org). Download the complete extended energy balance data for at least one country in a .csv file format as shown in the following figure.

knitr::include_graphics("figs/original_header.pdf")

Example data from two countries, Ghana (GHA) and South Africa (ZAF), for two years, 1971 and 2000, are provided in the IEATools package.

library(dplyr)
library(IEATools)
# Define the file location
IEA_path <- file.path("extdata", "GH-ZA-ktoe-Extended-Energy-Balances-sample.csv") %>% 
  system.file(package = "IEATools")
# Or use the function
IEA_path <- sample_iea_data_path()
readChar(IEA_path, nchars = 256)

Check the integrity of the IEA data file (iea_file_OK())

When starting to work with an IEA data file, it is important to verify its integrity. iea_file_OK() performs This function performs some validation tests on its argument. Specifically, this function confirms that every country has the same FLOW and PRODUCT rows in the same order. Note that the IEA data file is read internally with data.table::fread() without stripping white space. Also, if the IEA data file passes the integrity checks, a data frame representing the file (without modification) is returned.

IEA_path %>% 
  iea_file_OK() %>% 
  glimpse()

Convert the IEA data into an R data frame (iea_df())

Extended energy balance data can be pulled directly into an R data frame by the iea_df() function. iea_df() fixes the header shown in the figure above, converting it to a single row. Furthermore, iea_df() deals with the IEA's indicators for not applicable values ("x"), for unavailable values (".."), and for confidential values ("c"). (See "World Energy Balances: Database Documentation (2018 edition)" at (http://wds.iea.org/wds/pdf/worldbal_documentation.pdf).)

R has three concepts that could be used for "x" and "..": 0 would indicate value known to be zero. NULL would indicate an undefined value. NA would indicate a value that is unavailable.

In theory, mapping from the IEA's indicators to R should be done as follows: ".." (unavailable) and "c" (confidential) would be converted to NA in R. "x" (not applicable) would be converted to 0 in R. "NULL" would not be used. However, the IEA are not consistent with their coding. In some places ".." (unavailable) is used for not applicable values, e.g. World Anthracite supply in 1971. (World Anthracite supply in 1971 is actually not applicable ("x"), because Anthracite was classified under "Hard coal (if no detail)" in 1971.) On the other hand, ".." is used correctly for data in the most recent year when those data have not yet been incorporated into the database.

In the face of IEA's inconsistencies, the only rational way to proceed is to convert "x", "..", and "c" to "0". iea_df() performs that task.

IEA_data <- IEA_path %>% 
  iea_df()
IEA_data %>% 
  glimpse()

Next steps

There are several reasons why the IEA extended energy balance data are not convenient for immediate use.

  1. Column titles are SHOUTING.
  2. The COUNTRY column contains full (long) country names. It would be nicer if countries were identified by their 2- or 3-letter ISO codes.
  3. The demarcation between the supply and consumption sides of the ledger is unclear to new users.
  4. How data should be aggregated is unclear to new users.
  5. The data are not in tidy format.

The IEATools package has functions to address each of these issues.

Fix column titles (rename_iea_df_cols())

To unshout the column titles, use the rename_iea_df_cols() function.

IEA_data %>% 
  rename_iea_df_cols() %>% 
  glimpse()

Note that the example above uses the rename_iea_df_cols() function without arguments, because the default arguments are correct for the extended energy balance data as it arrives from the IEA. However, both the old and new names can be specified as arguments. If you despise both capitals letters and vowels, you could do the following.

IEA_data %>% 
  rename_iea_df_cols(new_country = "cntry", new_flow = "flw", new_product = "prdct")  %>% 
  glimpse()

The default arguments are consistent throughout the IEATools package. Thus, it is recommended that you use the default argument values, wherever possible.

Change to 3-letter ISO country abbreviations (use_iso_countries())

To reduce the length of character strings representing countries, the use_iso_countries() function replaces country character strings with each country's 3-letter abbreviation.

IEA_data %>% 
  rename_iea_df_cols() %>% 
  use_iso_countries() %>% 
  glimpse()

Remove aggregation and memo data (remove_agg_memo_flows())

The IEA's extended energy balances contain many memos and aggregations in the data itself. For the purposes of manipulations and calculations, it is often advisable to remove all memos and aggregations. The remove_agg_memo_flows() function performs that task.

IEA_data %>% 
  rename_iea_df_cols() %>% 
  filter(Flow == "Total primary energy supply") %>% 
  glimpse()
# Total primary energy supply is an aggregation row,
# so its rows should be absent after calling remove_agg_memo_flows().
IEA_data %>% 
  rename_iea_df_cols() %>% 
  remove_agg_memo_flows() %>% 
  filter(Flow == "Total primary energy supply")

Augment with ledger side and aggregation point information (augment_iea_df())

The IEA's extended energy balance data can be confusing for first-time users to understand. In particular, both

(a) the demarcation between the supply and consumption sides of the ledger and (b) how data should be aggregated

are unclear.

To clarify these issues and make later aggregation calculations possible, call the augment_iea_data() function. augment_iea_df() has many default arguments that work fine for as-delivered IEA extended energy balance data in kilotons of oil equivalent units. augment_iea_df() adds new columns Ledger.side, Flow.aggregation.point, Energy.type, and Unit.

IEA_data %>% 
  rename_iea_df_cols() %>% 
  augment_iea_df() %>% 
  glimpse()

Convert to a tidy data frame (tidy_iea_df())

The tidy format is a data frame where each datum is located on its own row and columns provide metadata for each datum. As delivered, the IEA's data are not tidy: years are spread to the right rather than being a single column. The tidy_iea_df() function converts to a tidy format. By default, tidy_iea_df() removes zeroes from the data frame, thereby reducing memory footprint.

Tidy_IEA_df <- IEA_data %>% 
  rename_iea_df_cols() %>% 
  use_iso_countries() %>% 
  remove_agg_memo_flows() %>% 
  augment_iea_df() %>% 
  tidy_iea_df()
Tidy_IEA_df %>% 
  glimpse()

The load_tidy_iea_df() function

For simplicity, all steps above are rolled into load_tidy_iea_df(). By default, load_tidy_iea_df() loads the sample data bundled with the package and converts it to a tidy IEA data frame, assuming default arguments for all functions. But you can supply the path to any IEA data file in the .iea_file argument to load_tidy_iea_df().

Simple <- load_tidy_iea_df()
Complicated <- sample_iea_data_path() %>% 
  iea_df() %>%
  rename_iea_df_cols() %>% 
  remove_agg_memo_flows() %>% 
  use_iso_countries() %>% 
  augment_iea_df() %>% 
  tidy_iea_df()
all(Simple == Complicated)

At this point, the integrity of the IEA data can be checked.

Check energy balances

The IEA extended energy balance data are usually close to balanced, but not quite. To check the integrity of the energy balances, use the calc_tidy_iea_df_balances() function. calc_tidy_iea_df_balances() adds several new columns to the tidy IEA data frame, including:

Balances <- Tidy_IEA_df %>% 
  calc_tidy_iea_df_balances()
Balances %>% 
  glimpse()
Balances %>% 
  tidy_iea_df_balanced()

Notice that the IEA data are not quite energy balanced. To fix the energy balance, use the fix_tidy_iea_df_balances() function. fix_tidy_iea_df_balances() adjusts the Statistical differences flow to achieve perfect energy balance.

# Fix product-level balances within each country
Tidy_IEA_df %>% 
  fix_tidy_iea_df_balances() %>% 
  calc_tidy_iea_df_balances() %>% 
  glimpse()

Sorting

When manipulating IEA data frames, rows may attain different sorting relative to the standard IEA order. The sort_iea_df() function can return a data frame to IEA sort order.

  Tidy_IEA_df <- load_tidy_iea_df()
  Tidy_IEA_df %>% dplyr::select(Year, Flow, Product, E.dot)
  # Move the first row to the bottom to put everything out of order
  Unsorted <- Tidy_IEA_df[-1, ] %>% dplyr::bind_rows(Tidy_IEA_df[1, ])
  Unsorted %>% dplyr::select(Year, Flow, Product, E.dot)
  # Now sort it to put everything right again
  Sorted <- sort_iea_df(Unsorted)
  Sorted %>% dplyr::select(Year, Flow, Product, E.dot)

sort_iea_df() also works with wide data frames in which years are spread to the right.

  # Pivoting the data results in unconventional ordering
  Unsorted_wide <- Tidy_IEA_df %>% tidyr::pivot_wider(names_from = Year, values_from = E.dot)
  Unsorted_wide %>% dplyr::select(Flow, Product, `1971`, `2000`)
  # The wide data frame is not sorted correctly. Sort it.
  Sorted_wide <- sort_iea_df(Unsorted_wide)
  Sorted_wide %>% dplyr::select(Flow, Product, `1971`, `2000`)

Conclusion

At this point, IEA extended energy balance data have been imported to R and are ready to be used. The next step could be to specify some of the IEA energy flows. See the specify vignette for details.



MatthewHeun/IEATools documentation built on Feb. 6, 2024, 3:29 p.m.