models2wux: Processing climate model output


Reads various climate model NetCDF outputs, processes them according to userinput, and writes the processed data to a data.frame.

The data.frame output of WUX (the WUX data frame) contains the climate change signals for user-specified periods, regions, seasons, and parameters for each of the indicated climate models as defined in userinput.

The userinput is a named list object or a file containing a named list. It passes the controlling parameters to models2wux. The file paths, file names and meta-information on the climate simulations are stored in another list called modelinput. See the "Details" section and the "Configfile userinput" and "Configfile modelinput" section for a detailed description of these two lists.


models2wux(userinput, modelinput)



The specification of e.g. the parameters, periods, aggregation statistics, seasons, subregions, and climate models to be processed. This is either a file name containing a list which will be sourced internally, or a list object.


The specifications of file paths, file names and meta-information of every single climate simulation output you have stored on your HDD. This is either a file name containing a list which will be sourced internally, or a list object.


To process a climate multimodel ensemble of your choice, models2wux needs two config files userinput and modelinput, both being named list objects or files containing a named list.

modelinput stores general information about your climate data, i.e. the locations of the NetCDF files and their filenames. It also safes certain metainformation for the specific climate simulations (e.g. a unique acronym for the simulation; the developing institution; the radiative forcing). Usually the modelinput information should be stored in a single file on your system and should be updated when new climate simulations come in. It is advisable to share this file with your collegues if you work with the same NetCDF files on a shared IT infrastructure.

userinput contains information on what you actually want models2wux to be doing for you, mainly, which climate simulations defined in modelinput should be processed and what kind of statistic should be performed. You also define the geographical regions of interest you want to investigate and what time horizon you want to regard. Here is an overview of all possible tags a userinput list contains:

parameter.names Specification of parameters to process.
reference.period Specification of the reference period.
scenario.period Specification of the scenario period.
temporal.aggregation Specification of the temporal aggregation of the climate models (e.g. monthly mean or season sum) and indicating if either time series or climate change signals should be created.
subregions Specification of subregions.
area.fraction Take parts of model-pixels according to subregion coverage.
spatial.weighting Cosine areal weighting of regular grid.
na.rm Behavior for missing values of timeslices.
plot.subregions Specifies diagnostic plotting of grid points within the subregions. Specification of output directory and filename.
climate.models Specification of climate models to be processed.

This is what models2wux is doing: First, models2wux extracts attributes set in the userinput list and loads the corresponding model information (storage paths, filenames, ...) from the modelinput list. It then retrieves the geographical boundaries of the specified regions in subregions (here the model gridfiles are introduced) and reads the specified parameter data from the NetCDF files within the boundaries of the actual subregion. Subsequently, models2wux aggregates over the time dimension by the indicated months for the specified periods and calculates either the climatological mean values of the reference and future period and the according climate change signals or time series. Next, models2wux aggregates over the spatial dimension. models2wux repeats these processing steps for each model specified in climate.models, each parameter in parameter.names, each subregion in subregions, and each period in reference.period and scenario.period, respectively. Finally, the processed data is written to a data.frame and stored to the hard disk as indicated by

For more detailed information on modelinput and userinput see the corresponding sections Configfile "modelinput" and Configfile "userinput" in this help page.


A data.frame of class c("wux.df", "data.frame") containing climate change signals for all models, subregions, and parameters specified in userinput. It also writes a csv file on your HDD.

Configfile "userinput"

Those are specifications the user provides to control models2wux.


A character vector of parameters to be processed according to the NetCDF Climate and Forecast (CF) Metadata Convention (,
e.g. parameter.names = c("air_temperature", "precipitation_amount").


A character specifying the climate change reference period defined by "from-to" ("YYYY-YYYY"),
e.g. reference.period = "1961-1990".


A character specifying the climate change future period defined by "from-to" ("YYYY-YYYY"),
e.g. scenario.period = "2021-2050".


A named list containing the n different levels of statistical aggregation where the single list elements are sequentially named by stat.level.1, stat.level.2, stat.level.3, ... , stat.level.n. Each stat.level is again a list containing three elements: period, statistic, and time.series.


A named list containing the time period of temporal aggregation. The first aggregation level (stat.level.1) refers to the number of the month in the year. All subsequent aggregation levels refer to the list names of the previous stat.level (i.e. nested structure). For example, in stat.level.1 seasons are defined via
period=list(DJF=c(12,1,2), MAM=c(3,4,5), JJA=c(6,7,8), SON=c(9,10,11)).
Winter and summer half years can then be defined in stat.level.2 referring to the list names indicated in stat.level.1:
period=list(winter=c(SON,DJF), summer=c(MAM,JJA))


A string indicating the statistic which is used to aggregate the data. The statistic can be every statistic which is known to R (e.g., mean, sum, quantile).


TRUE or FALSE indicating if time series or climatological mean values of the reference and future period and the according climate change signals are calculated.


Named list containing information for geographical regions. You can specify the boundaries by passing

All longitude coordinate values are forced to the range from -180 to 180 degrees. In case you want to define a subregion containing the (180,-180)-meridian, you should force the longitude values to the range from 0 to 360 degrees, as it could be the case for the Australasian domain. This can be done with the (currently defined only for shapefiles).


A vector of the form c(lon.west, lon.east, lat.north, lat.south).
e.g. World = c(-180, 180, 90, -90)


A named list containing the directory to the shapefiles dirname and the name of the files filename (without file extension). Optional: If no projection file is available, you can set a projection tag to
e.g. projection = "+proj=longlat +ellps=WGS84".

In case there are more regions defined in the shapefile, one can give specific names to the subregionnames tag e.g subregionnames = c("South_America", "Central_America"). However, sometimes these multiple regions form a set. Then the category.variable tag merges the subregions with the same category to a single subregion and category.label gives corresponding labels. category.label has to be a named vector, with the names being the category values from the category.variable and their values being the labels. Omitting the category.label vector when using category.variable, WUX tries to get the names of category.variable. Note that the subregionnames tag and the category.label should not be used together.

In case you want to wrap your longitudes to the 0-360-degrees grid, flag the named vector = c("my.subregion" = "360"). Example:
CORDEX = list(dirname = "/tmp/shapefiles/cordex", filename = "cordex_regions", subregionnames = c("South_America", "Central_America", "North_America", "EU.ENS", "Africa", "West_Asia", "East_Asia","Central_Asia", "Australasia", "Antarctica", "Arctic", "Mediterranean_domain"), = c('Australasia' = "360")).

NetCDF subregionfile:

A named list containing information about the NetCDF file defining the subregion by a constant value (e.g. all pixels flagged by 1 define a subregion). Names of the list have to be:

subreg.file Name of the NetCDF subregions file.
subreg.dir Path to the NetCDF subregions file.
grid.file Name of NetCDF file with longitude and latitude coordinates of the subregions file.
grid.dir Directory of grid.file. Variable name in subreg.file file defining the region.
mask.value Value of defining the region. If more regions are defined, use a vector of values to analyse a set of them.


Dealing with gridded data, subregions almost never happen do be cut out exactly the way your subregion is specified. If the centroid of a single data pixel lies within the subregion, this datapoint will be taken into analysis, else the datapoint will be considered as lying outside of the subregion and set NA. This is WUX default behavior (area.fraction = FALSE). For very small subregions and/or very course data resolution however, it can happen you get very few data points or even none at all.

However, if you want to take every data pixel which just 'touches' your subregion, use area.fraction. The pixel's centroid doesn't have to be necessarily inside the subregion to be taken into analysis then. With area.fraction = TRUE WUX does a weighted spatial average of all these pixels. The weight is the ratio of the pixel area lying within the subregion and the entire pixel area. So if one quarter of a data point is wihin the subregion (but its centroid for example is not), the data pixel value will be taken into analysis and weighted by 0.25 when averaging spatially. Pixels being covered completely in the subregion have weight 1. area.fraction is useful if you are dealing with very small subregions and/or small data resolution, resulting in just a few pixels.


When averaging data over its spatial component, the simple arithmetic mean can result in strongly biased areal estimates. The reason for this is due the geographical projection of the data. The globe has 360 longitudinal degrees and 180 degrees in latitude. The real distance (km) between latitudes remains the same on the entire globe, whereas the distances between longitudes depend on the latitude considered. One degree in longitude near equator represents much more distance (km) than one degree in Norway as the longitudes converge at the poles.

This fact has to be considered especially when dealing with global data (e.g. GCMs). GCM data is usually (within WUX so far 100%) stored on a rectangular lon-lat grid. Therefore the poles seem overproportionaly large in area. Common practice is cosine weighting of latides, resulting in smaller weights near the poles and largest weights at the equator. See for more details.

spatial.weighting = TRUE enables cosine weighting of latitudes, whereas omitting or setting FALSE results in unweighted arithmetic areal mean (default). This option is valid only for data on a regular grid.


It may happen that time slices of NetCDF data may be missing and the user does not know anything about it. Reason for these artifacts might be short time series (e.g. some models project only until 2035, so an analysis unitl 2050 would be biased) or simply missing values due to corrupt or missing NetCDF files.

If na.rm = TRUE is set in the user input, missing values are filled with NA, but the temporal statistics are calculated using the na.rm = TRUE flag. na.rm = FALSE keeps the NA values and thus leads to NA statistics.


A list containing information about diagnostic plotting of grid points within the subregions. png plots are generated showing the grid points within a subregion. The size of the drawn circles correspond to the weighting factor of area.fraction. The list contains three elements: save.subregion.plots, xlim, and ylim.


A character containing only the output path as the filenames are automatically generated via the model and subregion names. For example save.subregion.plots = "/tmp/" will save the plots in the directory /tmp/. If save.subregion.plots is not specified no plots will be drawn!


A vector containing the longitudional boundaries of the plots. For example xlim=c(10,50) draws the plot from 10 to 50 degrees East. If xlim is not specified the boundaries will be automatically generated.


A vector containing the longitudional boundaries of the plots. For example xlim=c(10,50) draws the plot from 10 to 50 degrees North. If ylim is not specified the boundaries will be automatically generated.


Factor for pointsize relative to the default.

A character containing both the output path and filename. For example = "/tmp/cmip3" will save files in the directory /tmp/ as cmip3.csv (data frame containing model climatologies), cmip3_diff.csv (data frame containing the differences of the climatologies, i.e. the climate change signals) and cmip3.Rdata (a R binary file which can be loaded into the next R session containing variables and data frames analog to the csv-files).


A character vector containing the names of the models to be processed. The names must be identical to the unique acronyms in the modelinput list. Read the next section if you want to add a model in the modelinput file.

Configfile "modelinput"

When you want to read in a new climate simulation WUX does not know so far, all you need to do is to specify this model in the modelinput list (which should be stored in a file). You don't need to write tedious input routines, WUX does that for you. The modelinput list is a named list of climate models and contains meta-information of all currently known climate models. Sometimes models indicate wrong attributes in their NetCDF files needed by modelinput. Therfore: KNOW YOUR MODEL YOU WANT TO ADD AND TAKE CARE OF THE META-INFORMATION YOU ARE INDICATING IN modelinput.

Each tag consists of a named list with the following mandatory tags (i.e. names):


Character indicating the institute which is developing the model.


Character name indicating the RCM acronym; if you are processing a gcm type "".


Character name indicating the GCM acronym.


Type of emission scenario used for the simulation.


Name of NetCDF grid file containing the lon/lat variables.


Directory of the NetCDF grid file.


Default directory of the NetCDF data files. If the files are stored not only in one directory, use the file.path.alt tag (see below).


If your files are stored not only in one directory, here you can enter a named vector of paths. If files are scattered by parameter, pass the parameter name (CF Metadata convention) as the vector name. If they are split by periods, then pass historical and scenario as vector names. If files are seperated by both period and parameter, you can use nested named lists instead of vectors.

Character vector of file names of the NetCDF data files. If there are different file names for parameters (which will be mostly the case) and/or file names in scenario- and historical period are of different nature as well, use named or nested lists as in the file.path.alt tag. You can set this tag NA if this climate model has no files. This makes sense for example for the GKSS model for global radiation, as this ENSEMBLES model does not provide this parameter. Values for this model will be NA in the WUX dataframe.

These tags are optional:


Grid resolution character.

GCM run. Default is blank "".


Default are daily time steps, type "monthly" for monthly data.


Define the NetCDF time:calender attribute by hand. This is necessary if the NetCDF file contains wrong information. You can pass 360_days, no_leap or julian.


Define the NetCDF time:units attribute by hand. E.g. days since 1950-01-06 00:00:00.


The time variable in NetCDF files is a vector of time steps relative to the "time:units" attribute with calendar according to the "time:calendar" attribute. However, there are cases where certain climate models are dealing with two calendar types at once! Yes, that's possible... For example: Data claim to have a "360 days" calendar. The "time:units" attribute is set to days since 1961-01-01 00:00:00 and the time vector looks like 365, 366, ..., 723, 724. The 365th day since 1961-01-01 is definetely not the 1st January of 1962 concerning the 360-days calendar but is correctly in terms of "julian" dates.

In such a case we would set count.first.time.value = "julian" and calendar remains 360 days. Other possibilities are count.first.time.value = "noleap" (or = "360days"). Currently this property is defined for calendar = "360 days" only, but can easily be extended to other calendars as well.


A named vector indicating parameter long- and shortname which belong together, e.g. parameters = c(air_temperature = "tas_dm", precipitation_amount = "pr_24hc"). This is important if the NetCDF internal variable name deviates from the WUX default parameter shortname:

tas for air_temperature
pr for precipitation_amount
hurs for relative_humidity
rsds for global_radiation
wss for wind_speed
ua for eastward_wind
va for northward_wind
psl for air_pressure_at_sea_level
hus for specific_humidity
hfss for surface_upward_sensible_heat_flux
tasmin for air_temperature_minimum
tasmax for air_temperature_maximum
ts for surface_temperature


This is an awesome tool (rfp).


Thomas Mendlik and Georg Heinrich

See Also

modelinput_test, userinput_CMIP5_changesignal, cmip5_2050, cmip5_2100, ensembles, ensembles_gcms


## This example shows a typical workflow for models2wux, the workhorse of
## the wux package. Going through this example step-by-step, you will
## retrieve NetCDF files of two CMIP5 simulations and aggregate them to
## an R data.frame for further analysis. 

## I) Load wux functions and example datasets...

## II) You need to obtain the climate simulations first. You can get
## started with downloading some example CMIP5 NetCDF files from the
## ESGF visiting for example or using the
## CMIP5fromESGF function. Here, we dowload two simulations "NorESM1-M" and
## "CanESM2" into your home directory "~/tmp/CMIP5/" which will be
## created automatically. You will need a valid account at any ESGF
## node for this function to run. See ?CMIP5fromESGF for further help.
## Not run: CMIP5fromESGF( = "~/tmp/CMIP5/",
                models = c("NorESM1-M", "CanESM2"),
                variables = c("tas"),
                experiments= c("historical", "rcp85"))

## End(Not run)

## III) Specify those downloaded data for models2wux. models2wux needs
## to know where the data is stored on your HDD and needs to have access
## to certain metadata of the climate simulator, which you have to
## provide as well. This information is stored in a list, which should
## be saved as ONE file somewhere on your computer. We call this
## information "modelinput". You should share this 
## file with you collegues using the same IT infrastructure to share
## synergies. You can create such a file based on the data downloade
## by "CMIP5fromESGF":
## Not run: CMIP5toModelinput(filedir = "~/tmp/CMIP5",
         = "~/modelinput.R")

## End(Not run)
## This file then would look this:

## It specifies temperature and precipitation files for the two
## simulations "NorESM1-M" and "CanESM2" (RCP8.5), stored in
## "~/tmp/CMIP5/". 

## IV) Next, you need to specify which simulations you want to read in
## with models2wux, what kind of statistics to calculate, what subregion
## to analyze, what time periods and seasons to define, and so on. This
## is done with a user input file, which cntains a list with all the
## necessary information. You typically use different userinput files
## for different analysis, whereas your modelinput should remain in ONE
## file which will be updated each time you obtain a new climate
## simulation. One example user input file, which reads in both
## simulations specified above for the Alpine domain and returns their
## projected climate change signal, could look like follows: 

## alternatively following userinput returns a timeseries of both
## models, which only differs by the "time.series" tag and differently
## specified periods: 

## V) At last you can run models2wux to obtain a data.frame of the
## specified climatic change features defined above:
## Not run: climchange.df <- models2wux(userinput = userinput_CMIP5_changesignal,
                            modelinput = modelinput_test)
## End(Not run)
## A better practice is to safe both input files containing a named 
## list each somewhere on your disk and pass the files directly to the
## models2wux function. If you  had stored the two files in your home
## directory as e.g. "~/userinput.R" and "~/modelinput.R" you can call: 
## Not run: climchange.df <- models2wux(userinput = "~/userinput.R",
                            modelinput = "~/modelinput.R")
## End(Not run)
## if you downloaded the data correctly, you should obtain a data.frame:
## Not run: 

## End(Not run)			    

## which should be identical to this example data.frame:

## Instead of calculating the climate change signals, you can also
## generate time series of the two models aggregated over the Alpine
## domain, using a different user input file:
## Not run: climchange.df <- models2wux(userinput = userinput_CMIP5_timeseries,
                            modelinput = modelinput_test)
## End(Not run)

## VI) Finally you can make all kind of analysis you are interested in,
## using either functions from wux or from any other R funtionality
summary(CMIP5_example_changesignal, parms = "delta.air_temperature")

## or plot timeseries as
## Not run: xyplot(air_temperature ~ year|season,
       groups = acronym,
       data = CMIP5_example_timeseries,
       type = c("l", "g"),
       main = "NorESM1-M and CanESM2 simulations over Alpine Region\nRCP 8.5 forcing")
## End(Not run)

Questions? Problems? Suggestions? or email at

All documentation is copyright its authors; we didn't write any of that.