knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(dplyr) library(IEATools)
IEATools
is an R
package that provides functions to analyze
extended energy balance data from the
International Energy Agency (IEA).
The IEA's data are not free.
But if you have access to the data, you can follow the instructions in this vignette to
analyze it.
To get started, data must be obtained and pulled into an R data frame.
Purchase the IEA extended energy balances product from (https://data.iea.org). Download the complete extended energy balance data for at least one country in a .csv file format as shown in the following figure.
knitr::include_graphics("figs/original_header.pdf")
Example data from two countries, Ghana (GHA) and South Africa (ZAF),
for two years, 1971 and 2000, are provided in the IEATools
package.
library(dplyr) library(IEATools) # Define the file location IEA_path <- file.path("extdata", "GH-ZA-ktoe-Extended-Energy-Balances-sample.csv") %>% system.file(package = "IEATools") # Or use the function IEA_path <- sample_iea_data_path() readChar(IEA_path, nchars = 256)
iea_file_OK()
)When starting to work with an IEA data file,
it is important to verify its integrity.
iea_file_OK()
performs This function performs some validation tests on its argument.
Specifically, this function confirms that every country has the same FLOW
and PRODUCT
rows in the same order.
Note that the IEA data file is read internally with data.table::fread()
without stripping white space.
Also, if the IEA data file passes the integrity checks,
a data frame representing the file (without modification) is returned.
IEA_path %>% iea_file_OK() %>% glimpse()
iea_df()
)Extended energy balance data can be pulled directly into an R
data frame
by the iea_df()
function.
iea_df()
fixes the header shown in the figure above,
converting it to a single row.
Furthermore, iea_df()
deals with the IEA's indicators for
not applicable values ("x
"),
for unavailable values ("..
"), and
for confidential values ("c
").
(See "World Energy Balances: Database Documentation (2018 edition)" at
(http://wds.iea.org/wds/pdf/worldbal_documentation.pdf).)
R
has three concepts that could be used for "x
" and "..
":
0
would indicate value known to be zero.
NULL
would indicate an undefined value.
NA
would indicate a value that is unavailable.
In theory, mapping from the IEA's indicators to R
should be done as follows:
"..
" (unavailable) and "c
" (confidential) would be converted to NA
in R
.
"x
" (not applicable) would be converted to 0
in R
.
"NULL
" would not be used.
However, the IEA are not consistent with their coding.
In some places "..
" (unavailable) is used for not applicable values,
e.g. World Anthracite supply in 1971.
(World Anthracite supply in 1971 is actually not applicable ("x
"),
because Anthracite was classified under "Hard coal (if no detail)" in 1971.)
On the other hand, "..
" is used correctly for data in the most recent year
when those data have not yet been incorporated into the database.
In the face of IEA's inconsistencies,
the only rational way to proceed is to convert
"x
", "..
", and "c
" to "0
".
iea_df()
performs that task.
IEA_data <- IEA_path %>% iea_df() IEA_data %>% glimpse()
There are several reasons why the IEA extended energy balance data are not convenient for immediate use.
COUNTRY
column contains full (long) country names.
It would be nicer if countries were identified by their 2- or 3-letter ISO codes.The IEATools
package has functions to address each of these issues.
rename_iea_df_cols()
)To unshout the column titles, use the rename_iea_df_cols()
function.
IEA_data %>% rename_iea_df_cols() %>% glimpse()
Note that the example above uses the rename_iea_df_cols()
function without arguments,
because the default arguments are correct for the extended energy balance data
as it arrives from the IEA.
However, both the old and new names can be specified as arguments.
If you despise both capitals letters and vowels, you could do the following.
IEA_data %>% rename_iea_df_cols(new_country = "cntry", new_flow = "flw", new_product = "prdct") %>% glimpse()
The default arguments are consistent throughout the IEATools
package.
Thus, it is recommended that you use the default argument values,
wherever possible.
use_iso_countries()
)To reduce the length of character strings representing countries,
the use_iso_countries()
function replaces country character strings
with each country's 3-letter abbreviation.
IEA_data %>% rename_iea_df_cols() %>% use_iso_countries() %>% glimpse()
remove_agg_memo_flows()
)The IEA's extended energy balances contain many memos and aggregations in the data itself.
For the purposes of manipulations and calculations,
it is often advisable to remove all memos and aggregations.
The remove_agg_memo_flows()
function performs that task.
IEA_data %>% rename_iea_df_cols() %>% filter(Flow == "Total primary energy supply") %>% glimpse() # Total primary energy supply is an aggregation row, # so its rows should be absent after calling remove_agg_memo_flows(). IEA_data %>% rename_iea_df_cols() %>% remove_agg_memo_flows() %>% filter(Flow == "Total primary energy supply")
augment_iea_df()
)The IEA's extended energy balance data can be confusing for first-time users to understand. In particular, both
(a) the demarcation between the supply and consumption sides of the ledger and (b) how data should be aggregated
are unclear.
To clarify these issues and make later aggregation calculations possible,
call the augment_iea_data()
function.
augment_iea_df()
has many default arguments that work fine
for as-delivered IEA extended energy balance data
in kilotons of oil equivalent units.
augment_iea_df()
adds new columns
Ledger.side
, Flow.aggregation.point
, Energy.type
, and Unit
.
IEA_data %>% rename_iea_df_cols() %>% augment_iea_df() %>% glimpse()
tidy_iea_df()
)The tidy format
is a data frame where each datum is located on its own row
and columns provide metadata for each datum.
As delivered, the IEA's data are not tidy:
years are spread to the right
rather than being a single column.
The tidy_iea_df()
function converts to a tidy format.
By default, tidy_iea_df()
removes zeroes from the data frame,
thereby reducing memory footprint.
Tidy_IEA_df <- IEA_data %>% rename_iea_df_cols() %>% use_iso_countries() %>% remove_agg_memo_flows() %>% augment_iea_df() %>% tidy_iea_df() Tidy_IEA_df %>% glimpse()
load_tidy_iea_df()
functionFor simplicity,
all steps above are rolled into load_tidy_iea_df()
.
By default, load_tidy_iea_df()
loads the sample data bundled with the package and
converts it to a tidy IEA data frame,
assuming default arguments for all functions.
But you can supply the path to any IEA data file
in the .iea_file
argument to load_tidy_iea_df()
.
Simple <- load_tidy_iea_df() Complicated <- sample_iea_data_path() %>% iea_df() %>% rename_iea_df_cols() %>% remove_agg_memo_flows() %>% use_iso_countries() %>% augment_iea_df() %>% tidy_iea_df() all(Simple == Complicated)
At this point, the integrity of the IEA data can be checked.
The IEA extended energy balance data are usually close to balanced, but not quite.
To check the integrity of the energy balances,
use the calc_tidy_iea_df_balances()
function.
calc_tidy_iea_df_balances()
adds several new columns to the tidy IEA data frame,
including:
supply_sum
(the sum of all flows on the supply side of the ledger),consumption_sum
(the sum of all flows on the consumption side of the ledger),supply_minus_consumption
(the difference between supply and consumption), balance_OK
(a logical telling whether the energy balance is acceptable), and err
(supply sum when consumption_sum
is NA
or
supply_minus_consumption
when consumption_sum
is not NA
).Balances <- Tidy_IEA_df %>% calc_tidy_iea_df_balances() Balances %>% glimpse() Balances %>% tidy_iea_df_balanced()
Notice that the IEA data are not quite energy balanced.
To fix the energy balance, use the fix_tidy_iea_df_balances()
function.
fix_tidy_iea_df_balances()
adjusts the Statistical differences
flow
to achieve perfect energy balance.
# Fix product-level balances within each country Tidy_IEA_df %>% fix_tidy_iea_df_balances() %>% calc_tidy_iea_df_balances() %>% glimpse()
When manipulating IEA data frames, rows may attain different sorting relative to the standard IEA order.
The sort_iea_df()
function can return a data frame to IEA sort order.
Tidy_IEA_df <- load_tidy_iea_df() Tidy_IEA_df %>% dplyr::select(Year, Flow, Product, E.dot) # Move the first row to the bottom to put everything out of order Unsorted <- Tidy_IEA_df[-1, ] %>% dplyr::bind_rows(Tidy_IEA_df[1, ]) Unsorted %>% dplyr::select(Year, Flow, Product, E.dot) # Now sort it to put everything right again Sorted <- sort_iea_df(Unsorted) Sorted %>% dplyr::select(Year, Flow, Product, E.dot)
sort_iea_df()
also works with wide data frames in which years are spread to the right.
# Pivoting the data results in unconventional ordering Unsorted_wide <- Tidy_IEA_df %>% tidyr::pivot_wider(names_from = Year, values_from = E.dot) Unsorted_wide %>% dplyr::select(Flow, Product, `1971`, `2000`) # The wide data frame is not sorted correctly. Sort it. Sorted_wide <- sort_iea_df(Unsorted_wide) Sorted_wide %>% dplyr::select(Flow, Product, `1971`, `2000`)
At this point, IEA extended energy balance
data have been imported to R
and
are ready to be used.
The next step could be to specify some of the IEA energy flows.
See the specify vignette for details.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.