iea_df | R Documentation |
If .slurped_iea_df
is supplied, arguments .iea_file
or text
are ignored.
If .slurped_iea_df
is absent,
either .iea_file
or text
are required, and
the helper function slurp_iea_to_raw_df()
is called internally
to load a raw data frame of data.
iea_df(
.iea_file = NULL,
text = NULL,
expected_1st_line_start = ",,TIME",
expected_2nd_line_start = "COUNTRY,FLOW,PRODUCT",
expected_simple_start = expected_2nd_line_start,
.slurped_iea_df = NULL,
flow = "FLOW",
missing_data = "..",
not_applicable_data = "x",
confidential_data = "c",
estimated_year = "E"
)
.iea_file |
A string containing the path to a .csv file of extended energy balances from the IEA.
Can be a vector of file paths, in which case
each file is loaded sequentially and stacked together
with |
text |
A character string that can be parsed as IEA extended energy balances. (This argument is useful for testing.) |
expected_1st_line_start |
the expected start of the first line of |
expected_2nd_line_start |
the expected start of the second line of |
expected_simple_start |
the expected starting of the first line of |
.slurped_iea_df |
a data frame created by |
flow |
the name of the flow column, entries of which are stripped of leading and trailing white space. Default is "FLOW". |
missing_data |
a string that identifies missing data. Default is " |
not_applicable_data |
a string that identifies not-applicable data. Default is "x".
Entries of |
confidential_data |
a string that identifies confidential data. Default is "c".
Entries of |
estimated_year |
a string that identifies an estimated year. Default is "E". E.g., in "2014E", the "E" indicates that data for 2014 are estimated. Data from estimated years are removed from output. |
Next, this function does some cleaning of the data.
In the IEA's data, some entries in the "FLOW" column are quoted to avoid creating too many columns.
For example, "Paper, pulp and printing" is quoted in the raw .csv file:
" Paper, pulp and printing".
Internally, this function uses data.table::fread()
, which, unfortunately, does not
strip leading and trailing white space from quoted entries.
So the function uses base::trimws()
to finish the job.
When the IEA includes estimated data for a year, the column name of the estimated year includes an "E" appended. (E.g., "2017E".) This function eliminates estimated columns.
The IEA data have indicators for
not applicable values ("x
") and for
unavailable values ("..
").
(See "World Energy Balances: Database Documentation (2018 edition)" at
http://wds.iea.org/wds/pdf/worldbal_documentation.pdf.)
R
has three concepts that could be used for "x
" and "..
":
0
would indicate value known to be zero.
NULL
would indicate an undefined value.
NA
would indicate a value that is unavailable.
In theory, mapping from the IEA's indicators to R
should proceed as follows:
"..
" (unavailable) in the IEA data would be converted to NA
in R
.
"x
" (not applicable) in the IEA data would be converted to 0
in R
.
"NULL
" would not be used.
However, the IEA are not consistent with their coding.
In some places "..
" (indicating unavailable) is used for not applicable values,
e.g., World Anthracite supply in 1971.
(World Anthracite supply in 1971 is actually not applicable, because Anthracite was
classified under "Hard coal (if no detail)" in 1971.)
On the other hand, "..
" is used for data in the most recent year
when those data have not yet been incorporated into the database.
In the face of IEA's inconsistencies,
the only rational way to proceed is to convert
both "x
" and "..
" in the IEA files to "0
" in the output data frame
from this function.
Furthermore, confidential data (coded by the IEA as "c
") is also interpreted as 0
.
(What else can we do?)
The data frame returned from this function is not ready to be used in R,
because rows are not unique.
To further prepare the data frame for use, call augment_iea_df()
,
passing the output of this function to the .iea_df
argument of augment_iea_df()
.
This function is vectorized over .iea_file
.
a data frame containing the IEA extended energy balances data
# Original file format
iea_df(text = paste0(",,TIME,1960,1961\n",
"COUNTRY,FLOW,PRODUCT\n",
"World,Production,Hard coal (if no detail),42,43"))
# With extra commas on the 2nd line
iea_df(text = paste0(",,TIME,1960,1961\n",
"COUNTRY,FLOW,PRODUCT,,,\n",
"World,Production,Hard coal (if no detail),42,43"))
# With a clean first line
iea_df(text = paste0("COUNTRY,FLOW,PRODUCT,1960,1961\n",
"World,Production,Hard coal (if no detail),42,43"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.