Most input files are comma separated value (CSV). We require csv files to have certain metadata headers, as described below.
All CSV input files should have a header that looks something like this:
# File: filename [REQUIRED] # Title: Title of dataset [REQUIRED] # Units: Units [REQUIRED] # Comments: Comments, may be multiline # Description: A synonym for comments, may be multiline # Source: Citation or source if available # Column types: iiccnn [REQUIRED] (data starts here)
You may use "Unit" or "Units"; "Comments" or "Description"; and "Reference", "References", "Source", or "Sources".
The "Column types" header line above is now mandatory, and gives the type (technically, the R class) that should be assigned to each column. The most common types are "i"=integer, "n"=numeric, "c"=character (string), and "l"=logical. A full list of supported types can be found here. Note that an "admin" (i.e. not exported) function called add_column_types_header_line
is provided which will guess column types and update CSV files that did not include this metadata (by default: overwrite = FALSE
) for you.
If the data in this source file all have one unit, that unit should be provided as noted above. Some other situations that may be encountered are noted below and how they should be abbreviated:
Unit or situation | Example 'Units' entry ----------------- | ----------------------- No units | NA or None Unitless | Unitless Variables with more than one unit in the same table | Mt for 'x', USD1990 for 'y' Units given as a column (or perhaps row) in the csv table | Units-in-table
Do not guess at units! Find the original source file or ask someone if you are not certain. As a last resort, "unknown" is acceptable (but in that case be sure to open an issue).
Be careful with editing in Excel (e.g., a comma within a line will result in a tab and the remainder of the line will not be read in by the metadata parser)
Please compress any file larger than ~1 MB (giving it either a .zip
or .gz
extension).
There is one proprietary dataset used in gcamdata at present: the IEA's World Energy Balances. At present gcamdata uses the 2019 edition. While the "pre-built data" that comes with the package (in R/sysdata.rda
) allows users to run gcamdata without using the energy balances, certain modifications such as re-configuring GCAM's regions do require users to have the energy balances within the workspace. Because the dataset is proprietary, it can't be distributed; rather, users will need to purchase the dataset, and perform the following steps in order to get the correct file correctly in gcamdata. The following steps are performed on a PC, starting from the data browser that comes with the World Energy Balances.
# File: IEA_EnergyBalances_2019.csv.gz # Title: IEA World Energy Balances (2019 edition) # Units: ktoe; GWh; TJ # Column types: cccnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn # ---------- COUNTRY,FLOW,PRODUCT,1960, ... World,INDPROD,Hard coal (if no detail),, ...
Using a text editor, clean the csv file (see note below):
delete three empty cells in the first row, cell “YEAR, and the empty second row.
replace all “c” (confidential data) with “..” which then will be replaced with zero in R.
Save it using GZ compression with the following filename: IEA_EnergyBalances_2019.csv.gz
, and place it in the following folder in the gcamdata package: inst/extdata/energy/
NOTE: NEVER use Excel to clean the csv. Excel can only open part of the file, and the rest will be missing. If you use a PC, do not use Notepad (it takes 3 minutes per operation). Instead, use another, more powerful editor, like Sublime text editor.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.