slurp_iea_to_raw_df: Slurp an IEA extended energy balance data file

View source: R/initialize.R

slurp_iea_to_raw_dfR Documentation

Slurp an IEA extended energy balance data file

Description

This is the internal helper function that reads IEA data files. This function reads an IEA extended energy balances .csv file and converts it to a data frame with appropriately-labeled columns. One of iea_file or text must be specified, but not both. The first line of iea_file or text is expected to start with expected_start_1st_line, and the second line is expected to start with expected_2nd_line_start, and it may have any number of commas appended. (The extra commas might come from opening and re-saving the file in Excel.) Alternatively, the file may have a first line of expected_simple_start. If none of these conditions are not met, execution is halted, and an error message is provided. Files should have a return character at the end of their final line.

Usage

slurp_iea_to_raw_df(
  .iea_file = NULL,
  text = NULL,
  expected_1st_line_start = ",,TIME",
  country = "COUNTRY",
  expected_2nd_line_start = paste0(country, ",FLOW,PRODUCT"),
  expected_simple_start = expected_2nd_line_start,
  ensure_ascii_countries = TRUE
)

Arguments

.iea_file

The path to the raw IEA data file for which quality assurance is desired. Can be a vector of file paths, in which case each file is loaded sequentially and stacked together with dplyr::bind_rows().

text

A string containing text to be parsed as an IEA file. Can be a vector of text strings, in which case each string is processed sequentially and stacked together with dplyr::bind_rows().

expected_1st_line_start

The expected start of the first line of iea_file. Default is ",,TIME".

country

The name of the country column. Default is "COUNTRY".

expected_2nd_line_start

The expected start of the second line of iea_file. Default is "COUNTRY,FLOW,PRODUCT".

expected_simple_start

The expected starting of the first line of iea_file. Default is the value of expected_2nd_line_start. Note that expected_simple_start is sometimes encountered in data supplied by the IEA. Furthermore, expected_simple_start could be the format of the file when somebody "helpfully" fiddles with the raw data from the IEA.

ensure_ascii_countries

A boolean that tells whether to convert country names to pure ASCII, removing diacritical marks and accents. Default is TRUE.

Details

This function is designed to work as more years are added in columns at the right of the .iea_file, because column names in the output are constructed from the header line(s) of .iea_file (which contain years and country, flow, product information).

Extended energy balance data can be obtained from the IEA as a *.ivt file. To export the data for use with the IEATools package, perform the following actions:

  1. Open the *.ivt file in the Beyond 20/20 browser on a relatively high-powered computer with lots of memory, because the file is very large.

  2. Arrange the columns in the following order: "COUNTRY", "FLOW", "PRODUCT", followed by years.

  3. Change to the unit (ktoe or TJ) desired.

  4. Save the results in .csv format. (Saving may take a while.)

This function is vectorized over .iea_file.

Value

A raw data frame of IEA extended energy balance data with appropriate column titles.

Examples

# 2018 and earlier file format
slurp_iea_to_raw_df(text = paste0(",,TIME,1960,1961\n",
                     "COUNTRY,FLOW,PRODUCT\n",
                     "World,Production,Hard coal (if no detail),42,43"))
# With extra commas on the 2nd line
slurp_iea_to_raw_df(text = paste0(",,TIME,1960,1961\n",
                     "COUNTRY,FLOW,PRODUCT,,,\n",
                     "World,Production,Hard coal (if no detail),42,43"))
# With a clean first line (2019 file format)
slurp_iea_to_raw_df(text = paste0("COUNTRY,FLOW,PRODUCT,1960,1961\n",
                     "World,Production,Hard coal (if no detail),42,43"))

MatthewHeun/IEATools documentation built on Dec. 14, 2024, 12:08 a.m.