knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",cache = T,
  eval = FALSE
)
# load moinput package
library(moinput)
library(magclass)

What is moinput?

moinput or "model-operations input" is an R package written by the Model Operations team, in PIK's Research Domain III. It provides useful functions and a common structure to all the input data required to run models like MAgPIE and REMIND.

Objective

This document explains the general structure and workflow to writing and understanding functions in the moinput package.

Pre-requisites

Before proceeding, make sure that you have

  1. On your local machine, an input folder containing the following subfolders: cache, mappings, output, and sources (must be downloaded from the cluster). This folder contains the raw data from the source (often as .csv or .xlsx) that needs to be processed. (Note that this folder is very large. At the beginning, simply download the sources that are relevant to your work.)

  2. All R-packages developed at RD III: more info here

  3. SVN controlled version of the moinput package on your computer.

  4. Read the documentation about magclass, which explains what distinguishes magclass objects from data frames and other basic data structures, and introduces wrangling of magclass objects.

  5. Read the general structure of writing read(), convert(), and calc() functions in moinput here, and that function calls are always made using wrapper functions, e.g., readSource() and calcOutput() and not your own functions. If this still seems dense to you -don't worry!- it will be explained "in-action" later.

If you run library(moinput) and are asked to specify the location of the madrat mainfolder, point it to the input folder. To avoid writing it every time you load the package, you can specify the following in your global .Rprofile: options(MADRAT_MAINFOLDER="location/of/inputdata/folder")

Moinput workflow

Reading raw data and converting to a magclass-object

Before beginning, remember that data are moslty used as aggregated regions and not as individual countries. However, at the same time models are moving towards flexible regions, i.e., the flexibility to group countries in different regional combinations. This translates into two aspects - firstly, the convert<functionname>(), expects that all countries have some data on the variable of interest, and that there are no NAs (NA: Non-Available). Secondly, the more country specific the data source is, the better. If not, then depending upon the data set and significance, often either zero values or some weight is assigned to them.

read()

As a first example, we analyse readIRENA.R, convertIRENA.R and calcCapacity.R, functions which process historical capacity and generation data of various renewable energy technologies from the IRENA dataset to a usable form.

#' Read IRENA
#' Read-in an IRENA csv file as magclass object
#' @param subtype data subtype. Either "Capacity" or "Generation"
#' @return magpie object of the IRENA data with historical electricity renewable capacities (MW) or generation levels (GWh) 
#' @author Renato Rodrigues
#' @seealso \code{\link{readSource}}
#' @examples
#' \dontrun{ a <- readSource(type="IRENA",subtype="Capacity")
#' }
#' @importFrom reshape2 melt

 readIRENA <- function(subtype) {
   if (subtype == "Capacity") {
     #Reading renewables electricity capacity values in MW from csv
     data <- read.csv("Capacity.csv",sep=";")
   } else if (subtype== "Generation") {
     #Reading renewables electricity generation values in GWh from csv
     data <- read.csv("Generation.csv",sep=";")
   }else {
     stop("Not a valid subtype!")
   }
   # data in wide format 
   data <- melt(data, id.vars=c("Country.area", "Technology"), variable.name="years", value.name="value")  # melt requires library(reshape2)
   #replacing X by y on years preffix
   data$years <- gsub("X", "y", data$years)
   # rearrange column order to more readable format: year, country, tech, value (capacity or generation)
   data <- data[,c(3,1,2,4)]  
   # creating capacity or generation magpie object
   x <- as.magpie(data,temporal=1,spatial=2,datacol=4)
   return(x)
 }  

To run the code above, type in the following:

x <- readSource("IRENA",subtype = "Capacity",convert = FALSE)

To run the code line by line you will have to set your working directory to the folder where the input data file is stored.

The read() function should convert the raw dataset into a magpie object in as few steps as possible. All wrangling, name-changing etc. is left to subsquent functions.

Understanding the code:

  1. The dataset in its raw form is stored as CSV files in a folder called IRENA inside the sources sub-folder, i.e., the file name (readIRENA.R) and the folder name (IRENA) share the same name- IRENA. This helps the wrapper function readSource() to know where the data is located. Further, the name of the function readIRENA() is the same as the name of the file readIRENA.R. This is required for the wrapper function readSource("IRENA") to work (more about this later). To summarise, the folder name, the file name, and the function inside the filename, have a common element called IRENA, which helps all of them to work with each other.

The functionname or filename is generally named after the source from where it is taken. These could be from well-known annual publications, e.g, IEA, BP, IRENA, FAO or from a paper, e.g., Rogelj2017.

  1. The first 10 lines beginning with #' are part of the documentation^[A good documentation makes it possible for others to use and understand the functions that you write. While coding you will also use the documentation from other functions to see they work, so give back to the community and spend some time on writing it!] file, which appears when you type ?readIRENA. The content in these lines is saved as a .Rd file, that is generated using the roxygen2 package.

  2. The data is converted from a .csv file to a data frame, and finally from a data frame to a magpie object using as.magpie(). This last function is the only magpie-object specific function in this file.

  3. Since magpie objects always contains regions in the first column, years in the second column, and other dimensions in the third column, we help as.magpie() to point to where these columns are located in our file x <- as.magpie(data,temporal=1,spatial=2,datacol=4). For more information see `?as.magpie()

convert()

Now that the input data has been converted to a magpie object, our next objective should be to clean and wrangle the data to a form we would like to see; at a country-level resolution. This could include renaming variables, selecting certain years, changing units etc. All this is done using properties of magclass objects and specific helper functions. These will be introduced later.

As discussed before , convert() expects you to have numerical (non-NA) values for all (249) countries in the world. For this reason, when you run convert(), all countries not in your input database are given NA values. This appears as a warning message. To remove this warning message, countries not in the input data need to be given either some educated values or zero. This is explained later.

To run read() and convert() consecutively,the wrapper function readSource()is used.

x <- readSource("IRENA",subtype = "Capacity")

calc()

Unlike read() and convert()which are named according to their source, calc() functions are named according to the property or characteristic of the data. For the example above, the corresponding function is called calcCapacity.R. Because of this, calc() functions can include many datasets or operations. The calc() function has two main objectives- aggregating the data into REMIND/MAgPIE regions and renaming variable names to REMIND/MAgPIE conventions. The wrapper function for accessing calcCapacity.R is calcOutput("Capacity").

setConfig() and getConfig(), functions from the madrat package, allow you to change and see various settings, respectively, during your process of wrangling data. For e.g., which countries should map to which regions, should your results from a function call be stored in a cache, should the cache be accessed at all. Arguments or settings to these functions can be seen from ?setConfig(). Not all are equally important, we will see their use later.

Wrangling magclass objects

Before we consider specific examples, a basic set of useful functions are given below:

Country name to ISO3 conversion

To convert country names to ISO3 coding

madrat::toolCountry2isocode(c("Japan","Australia","Afghanistan"))

The mapping file, which `toolCountry2isocode() accesses is located in inputdata -> mapping -> cell. All countries not found in the mapping file are returned as NA. You can also create your own mapping file, place it in this folder, and within the argument type enter the name of the mapping file. If you have only a few countries not recognised by the mapping file, you can define their ISOcode explicitly.

madrat::toolCountry2isocode(c("Shire","Mordor"),mapping = c("Shire"="SHR","Mordor"="MOR"))

Creating new magpie objects

New magpie objects can be created with new.magpie(). See a few examples below:

new.magpie("GLO",years = NULL, fill= NA) # creates a region with no data on the variable "years"" and with NA as data
c <- new.magpie(c("IND","JPN"),years = c(1990,2000))

However, you will rarely use this function to create a magpie-object if your own. You might use it to create an empty magpie object with say, region and years, taken from an existing magpie object.

# If x is an existing magpie-object.
y <- new.magpie(getRegions(x),getYears(x),fill=NA)

Converting dataframe to magpie objects and vice-versa

You already saw before, how a data frame was converted to a magpie object using as.magpie(). Magpie objects can also be converted back to dataframes using as.data.frame(). This function is useful to quickly view your data using View()or to use the many functions available for dataframes but not (yet) available for magpie objects.

View(as.data.frame(x))

Filtering data

x <- readSource("IRENA",subtype = "Capacity")
head(getRegions(x[,"y2010","Solar photovoltaic"]>1000))
# OR
where(x[,"y2010","Solar photovoltaic"]>1000)
head(which(x[,"y2010","Solar photovoltaic"]>1000))
x <- x[c("CHN"),,,invert=TRUE]# exlcudes CHN
x <- x[c("CHN"),"2010",,invert=TRUE]# excludes CHN only for the year 2010

Finding certain variable names

Quite frequently, you will have to either rename variable names or perform operations (exclude, add, include,merge) on data with selected variable names. This can be achieved by using a combination of base R functions, properties of arrays on which magclass objects are based, and specific magclass functions.

# Display all variable names containing the word "Hydro" and "hydro"
getNames(x[,,c("hydro","Hydro"),pmatch=T])
x <- x[,,grep("ydro",getNames(x),fixed = FALSE)]# fixed=F or T decides how strict should be the matching

Appending/adding data to a dataset

x  <- toolCountryFill(x,fill=0)
x <- readSource("IRENA",subtype ="Capacity")
x <- time_interpolate(x,1999,integrate_interpolated_years =T,extrapolation_type = "linear" ) 

In the example above, the dataset extends only back till 2000. If you want to linearly extrapolate the data to 1999, you have to mention the year or years as an argument and the type of extrapolation.

x <- add_columns(x,addnm = "Wind_total", dim = 3.1)

The code above adds a variable name Wind_total_ to the exisiting set.

A new dimension can be created with add_dimension(). For e.g., if there are multiple data sources for exactly the same dataset (same variable name and years), and you want to distingush the two, you can add the name of the datasource as an additional dimension.

x_tmp <- add_dimension(x,dim = 3.1,add = "Source", nm="IRENA")

The argument add is the name of the new dimension, and nm is the value stored in it.

Running read() but not convert() using readSource()

Normally,readSource(), first calls read() and then follows it with convert(). If you only want the magpie object at the end of read(), use the argument convert:

x <- readSource("IRENA",subtype = "Capacity",convert = FALSE) # will run readIRENA() but not convertIRENA()

Using the cache when running functions

There are two settings that handle the cache-use: enablecache and forcecache; the former is TRUE by default and enables the use of the cache if nothing changes (with the data or functions). The latter forces the use of the cache (independent from all changes) for all or specific functions.

setConfig(enablecache = TRUE)# by default
  1. However, when you are developing a function, you might want only certain cache(s) turned ON. This can be done by specifying the functions:
setConfig(forcecache = c("calcIO-subtype_input","calcIO-subtype_output","calcIO-subtype_output_EDGE_buildings","calcIO-subtype_output_EDGE", "readRCPAviationShipping","convertRCPWaste","convertIEACHPreport"))

The cache files are stored in input folder-> -> default . Here you will see that each read(), convert() and calc(), have their own cache.

  1. When you are writing and testing a code/function for the first time, the result will automatically be stored as a cache. When you run it again, it will simply take the cache/result from the last run. What to do if you want forcecached turned ON (for other functions in your code) and yet not use the cache of your own function? You can either disable enablecache (see below) or delete the cache file (from the cache or default) sub-folder everytime it gets generated. To disable this:
setConfig(enablecache = FALSE)

full

All input files generated by moinput and fed into REMIND are can be viewed in fullREMIND.R, and similarly for MAGPIE in fullMAGPIE.R.



pik-piam/moinput documentation built on June 9, 2020, 12:23 p.m.