Processing climate model output
Reads various climate model NetCDF outputs, processes them according
userinput, and writes the processed data to a
data.frame output of WUX (the WUX data frame) contains
the climate change signals for user-specified periods, regions,
seasons, and parameters for each of the indicated climate models as
userinput is a named
list object or a file
containing a named
list. It passes the controlling parameters to
models2wux. The file paths, file names and meta-information on the
climate simulations are stored in another list called
modelinput. See the "Details" section and the "Configfile
userinput" and "Configfile modelinput" section for a detailed
description of these two lists.
The specification of e.g. the parameters, periods, aggregation statistics, seasons, subregions, and climate models to be processed. This is either a file name containing a list which will be sourced internally, or a list object.
The specifications of file paths, file names and meta-information of every single climate simulation output you have stored on your HDD. This is either a file name containing a list which will be sourced internally, or a list object.
To process a climate multimodel ensemble of your choice,
needs two config files
being named list objects or files containing a named
modelinput stores general information about your climate data,
i.e. the locations of the NetCDF files and their filenames. It also
safes certain metainformation for the specific climate simulations
(e.g. a unique acronym for the simulation; the developing institution;
the radiative forcing). Usually the
should be stored in a single file on your system and should be updated
when new climate simulations come in. It is advisable to share this
file with your collegues if you work with the same NetCDF files on a
shared IT infrastructure.
userinput contains information on what you actually want
models2wux to be doing for you, mainly, which climate
simulations defined in
modelinput should be processed and what
kind of statistic should be performed. You also define the
geographical regions of interest you want to investigate and what time
horizon you want to regard. Here is an overview of all possible tags a
userinput list contains:
||Specification of parameters to process.|
||Specification of the reference period.|
||Specification of the scenario period.|
||Specification of the temporal aggregation of the climate models (e.g. monthly mean or season sum) and indicating if either time series or climate change signals should be created.|
||Specification of subregions.|
||Take parts of model-pixels according to subregion coverage.|
||Cosine areal weighting of regular grid.|
||Behavior for missing values of timeslices.|
||Specifies diagnostic plotting of grid points within the subregions.|
||Specification of output directory and filename.|
||Specification of climate models to be processed.|
This is what
models2wux is doing: First,
extracts attributes set in the
userinput list and loads the
corresponding model information (storage paths, filenames, ...) from
modelinput list. It then retrieves the geographical
boundaries of the specified regions in
subregions (here the
model gridfiles are introduced) and reads the specified parameter data
from the NetCDF files within the boundaries of the actual
models2wux aggregates over the
time dimension by the indicated months for the specified periods and
calculates either the climatological mean values of the reference and
future period and the according climate change signals or time
models2wux aggregates over the spatial
models2wux repeats these processing steps for each
model specified in
climate.models, each parameter in
parameter.names, each subregion in
subregions, and each
respectively. Finally, the processed data is written to a
data.frame and stored to the hard disk as indicated by
For more detailed information on
userinput see the corresponding sections
Configfile "userinput" in this help page.
data.frame of class
c("wux.df", "data.frame") containing climate change signals for all models,
subregions, and parameters specified in
userinput. It also
writes a csv file on your HDD.
Those are specifications the user provides to control
A character vector of parameters to be processed according to the
NetCDF Climate and Forecast (CF) Metadata Convention
parameter.names = c("air_temperature", "precipitation_amount").
A character specifying the climate change reference period defined
by "from-to" ("YYYY-YYYY"),
reference.period = "1961-1990".
A character specifying the climate change future period defined
by "from-to" ("YYYY-YYYY"),
scenario.period = "2021-2050".
A named list containing the n different levels of statistical aggregation where the single list elements are sequentially named by stat.level.1, stat.level.2, stat.level.3, ... , stat.level.n. Each stat.level is again a list containing three elements: period, statistic, and time.series.
A named list containing the time period of temporal aggregation. The first aggregation level (stat.level.1) refers to the number of the month in the year. All subsequent aggregation levels refer to the list names of the previous stat.level (i.e. nested structure). For example, in stat.level.1 seasons are defined via
period=list(DJF=c(12,1,2), MAM=c(3,4,5), JJA=c(6,7,8), SON=c(9,10,11)).
Winter and summer half years can then be defined in stat.level.2 referring to the list names indicated in stat.level.1:
A string indicating the statistic which is used to aggregate the data. The statistic can be every statistic which is known to R (e.g., mean, sum, quantile).
FALSEindicating if time series or climatological mean values of the reference and future period and the according climate change signals are calculated.
Named list containing information for geographical regions. You can specify the boundaries by passing
a rectangular region by hand
a shapefile with subregions of interest
a NetCDF file containing subregions
All longitude coordinate values are forced to the range from -180 to
180 degrees. In case you want to define a subregion containing the
(180,-180)-meridian, you should force the longitude values to the
range from 0 to 360 degrees, as it could be the case for the
Australasian domain. This can be done with the
(currently defined only for shapefiles).
A vector of the form c(lon.west, lon.east, lat.north, lat.south).
World = c(-180, 180, 90, -90)
A named list containing the directory to the shapefiles
dirnameand the name of the files
filename(without file extension). Optional: If no projection file is available, you can set a
projection = "+proj=longlat +ellps=WGS84".
In case there are more regions defined in the shapefile, one can give specific names to the
subregionnames = c("South_America", "Central_America"). However, sometimes these multiple regions form a set. Then the
category.variabletag merges the subregions with the same category to a single subregion and
category.labelgives corresponding labels.
category.labelhas to be a named vector, with the names being the category values from the
category.variableand their values being the labels. Omitting the
category.labelvector when using
category.variable, WUX tries to get the names of
category.variable. Note that the
subregionnamestag and the
category.labelshould not be used together.
In case you want to wrap your longitudes to the 0-360-degrees grid, flag the named vector
wrap.to = c("my.subregion" = "360"). Example:
CORDEX = list(dirname = "/tmp/shapefiles/cordex", filename = "cordex_regions", subregionnames = c("South_America", "Central_America", "North_America", "EU.ENS", "Africa", "West_Asia", "East_Asia","Central_Asia", "Australasia", "Antarctica", "Arctic", "Mediterranean_domain"), wrap.to = c('Australasia' = "360")).
- NetCDF subregionfile:
A named list containing information about the NetCDF file defining the subregion by a constant value (e.g. all pixels flagged by
1define a subregion). Names of the list have to be:
Name of the NetCDF subregions file.
Path to the NetCDF subregions file.
Name of NetCDF file with longitude and latitude coordinates of the subregions file.
Variable name in
subreg.filefile defining the region.
mask.namedefining the region. If more regions are defined, use a vector of values to analyse a set of them.
Dealing with gridded data, subregions almost never happen do be cut out
exactly the way your subregion is specified. If the centroid of a
single data pixel lies within the subregion, this datapoint will
be taken into analysis, else the datapoint will be considered as
lying outside of the subregion and set
NA. This is WUX default behavior
area.fraction = FALSE). For very small subregions and/or
very course data resolution however, it can happen you get very
few data points or even none at all.
However, if you want to take every data pixel which just 'touches' your
area.fraction. The pixel's centroid doesn't have to be
necessarily inside the subregion to be taken into analysis then. With
area.fraction = TRUE WUX does a weighted spatial average of
all these pixels. The weight is the ratio of the pixel area lying
within the subregion and the entire pixel area. So if one quarter of a
data point is wihin the subregion (but its centroid for example is
not), the data pixel value will be taken into analysis and
0.25 when averaging spatially. Pixels
being covered completely in the
subregion have weight
area.fraction is useful if
you are dealing with very small subregions and/or small data
resolution, resulting in just a few pixels.
When averaging data over its spatial component, the simple arithmetic mean can result in strongly biased areal estimates. The reason for this is due the geographical projection of the data. The globe has 360 longitudinal degrees and 180 degrees in latitude. The real distance (km) between latitudes remains the same on the entire globe, whereas the distances between longitudes depend on the latitude considered. One degree in longitude near equator represents much more distance (km) than one degree in Norway as the longitudes converge at the poles.
This fact has to be considered especially when dealing with global data (e.g. GCMs). GCM data is usually (within WUX so far 100%) stored on a rectangular lon-lat grid. Therefore the poles seem overproportionaly large in area. Common practice is cosine weighting of latides, resulting in smaller weights near the poles and largest weights at the equator. See http://www.grassaf.org/general-documents/gsr/gsr_10.pdf for more details.
spatial.weighting = TRUE enables cosine weighting of
latitudes, whereas omitting or setting
FALSE results in unweighted
arithmetic areal mean (default). This option is valid only for
data on a regular grid.
It may happen that time slices of NetCDF data may be missing and the user does not know anything about it. Reason for these artifacts might be short time series (e.g. some models project only until 2035, so an analysis unitl 2050 would be biased) or simply missing values due to corrupt or missing NetCDF files.
na.rm = TRUE is set in the user input, missing values are
filled with NA, but the temporal statistics are calculated using the
na.rm = TRUE
na.rm = FALSE keeps the NA values
and thus leads to NA statistics.
A list containing information about diagnostic plotting of grid
points within the subregions.
png plots are generated
showing the grid points within a subregion. The size of the drawn
circles correspond to the weighting factor of
The list contains three elements:
A character containing only the output path as the filenames are automatically generated via the model and subregion names. For example
save.subregion.plots = "/tmp/"will save the plots in the directory
save.subregion.plotsis not specified no plots will be drawn!
A vector containing the longitudional boundaries of the plots. For example
xlim=c(10,50)draws the plot from 10 to 50 degrees East. If
xlimis not specified the boundaries will be automatically generated.
A vector containing the longitudional boundaries of the plots. For example
xlim=c(10,50)draws the plot from 10 to 50 degrees North. If
ylimis not specified the boundaries will be automatically generated.
Factor for pointsize relative to the default.
A character containing both the output path and
filename. For example
save.as.data = "/tmp/cmip3" will
save files in the directory
(data frame containing model climatologies),
(data frame containing the differences of the climatologies, i.e. the
climate change signals) and
cmip3.Rdata (a R binary file which
can be loaded into the next R session containing variables
wux.data.diff data frames analog to the
A character vector containing the names of the models to be
processed. The names must be identical to the unique acronyms in
modelinput list. Read the next section if you want to
add a model in the
When you want to read in a new climate simulation WUX does not know
so far, all you need to do is to specify this model in the
modelinput list (which should be stored in a file). You don't
need to write tedious input routines, WUX does that for you. The
modelinput list is a named list of climate models and
contains meta-information of all currently known climate
models. Sometimes models indicate wrong attributes in their NetCDF
files needed by
modelinput. Therfore: KNOW YOUR MODEL YOU
WANT TO ADD AND TAKE CARE OF THE META-INFORMATION YOU ARE INDICATING
Each tag consists of a named list with the following mandatory tags (i.e. names):
Character indicating the institute which is developing the model.
Character name indicating the RCM acronym; if you are processing a gcm type "".
Character name indicating the GCM acronym.
Type of emission scenario used for the simulation.
Name of NetCDF grid file containing the lon/lat variables.
Directory of the NetCDF grid file.
Default directory of the NetCDF data files. If the files are
stored not only in one directory, use the
If your files are stored not only in one directory, here you can
enter a named vector of paths. If files are scattered by parameter,
pass the parameter name (CF Metadata convention) as the vector
name. If they are split by periods, then pass
scenario as vector
names. If files are seperated by both period and parameter, you
can use nested named lists instead of vectors.
Character vector of file names of the NetCDF data
files. If there are different file names for parameters (which
will be mostly the case) and/or file names in scenario- and
historical period are of different nature as well, use named
or nested lists as in the
You can set this tag
NA if this climate model has no
files. This makes sense for example for the GKSS model for global
radiation, as this ENSEMBLES model does not provide this
parameter. Values for this model will be
NA in the WUX
These tags are optional:
Grid resolution character.
GCM run. Default is blank "".
Default are daily time steps, type
"monthly" for monthly
Define the NetCDF time:calender attribute by hand. This is
necessary if the NetCDF file contains wrong information. You can
Define the NetCDF time:units attribute by
days since 1950-01-06 00:00:00.
The time variable in NetCDF files is a vector of time steps relative
to the "time:units" attribute with calendar according
to the "time:calendar" attribute. However, there are cases where
certain climate models are dealing with two calendar types at
once! Yes, that's possible... For example: Data claim to have a
"360 days" calendar.
The "time:units" attribute is set to
days since 1961-01-01
00:00:00 and the time vector looks like
365, 366, ..., 723, 724. The 365th day since 1961-01-01 is
definetely not the 1st January of 1962 concerning the 360-days
calendar but is correctly in terms of "julian" dates.
In such a case we would set
count.first.time.value = "julian" and
remains 360 days. Other possibilities are
count.first.time.value = "noleap" (or
"360days"). Currently this property is defined for
= "360 days" only, but can easily be extended to other
calendars as well.
A named vector indicating parameter long- and shortname which
belong together, e.g.
parameters = c(air_temperature = "tas_dm",
precipitation_amount = "pr_24hc"). This is important if the
NetCDF internal variable name deviates from the WUX default
This is an awesome tool (rfp).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105
## This example shows a typical workflow for models2wux, the workhorse of ## the wux package. Going through this example step-by-step, you will ## retrieve NetCDF files of two CMIP5 simulations and aggregate them to ## an R data.frame for further analysis. ## I) Load wux functions and example datasets... library("wux") ## II) You need to obtain the climate simulations first. You can get ## started with downloading some example CMIP5 NetCDF files from the ## ESGF visiting for example http://pcmdi9.llnl.gov or using the ## CMIP5fromESGF function. Here, we dowload two simulations "NorESM1-M" and ## "CanESM2" into your home directory "~/tmp/CMIP5/" which will be ## created automatically. You will need a valid account at any ESGF ## node for this function to run. See ?CMIP5fromESGF for further help. ## Not run: CMIP5fromESGF(save.to = "~/tmp/CMIP5/", models = c("NorESM1-M", "CanESM2"), variables = c("tas"), experiments= c("historical", "rcp85")) ## End(Not run) ## III) Specify those downloaded data for models2wux. models2wux needs ## to know where the data is stored on your HDD and needs to have access ## to certain metadata of the climate simulator, which you have to ## provide as well. This information is stored in a list, which should ## be saved as ONE file somewhere on your computer. We call this ## information "modelinput". You should share this ## file with you collegues using the same IT infrastructure to share ## synergies. You can create such a file based on the data downloade ## by "CMIP5fromESGF": ## Not run: CMIP5toModelinput(filedir = "~/tmp/CMIP5", save.to = "~/modelinput.R") ## End(Not run) ## This file then would look this: data(modelinput_test) ## It specifies temperature and precipitation files for the two ## simulations "NorESM1-M" and "CanESM2" (RCP8.5), stored in ## "~/tmp/CMIP5/". str(modelinput_test) ## IV) Next, you need to specify which simulations you want to read in ## with models2wux, what kind of statistics to calculate, what subregion ## to analyze, what time periods and seasons to define, and so on. This ## is done with a user input file, which cntains a list with all the ## necessary information. You typically use different userinput files ## for different analysis, whereas your modelinput should remain in ONE ## file which will be updated each time you obtain a new climate ## simulation. One example user input file, which reads in both ## simulations specified above for the Alpine domain and returns their ## projected climate change signal, could look like follows: data(userinput_CMIP5_changesignal) str(userinput_CMIP5_changesignal) ## alternatively following userinput returns a timeseries of both ## models, which only differs by the "time.series" tag and differently ## specified periods: data(userinput_CMIP5_timeseries) str(userinput_CMIP5_timeseries) ## V) At last you can run models2wux to obtain a data.frame of the ## specified climatic change features defined above: ## Not run: climchange.df <- models2wux(userinput = userinput_CMIP5_changesignal, modelinput = modelinput_test) ## End(Not run) ## A better practice is to safe both input files containing a named ## list each somewhere on your disk and pass the files directly to the ## models2wux function. If you had stored the two files in your home ## directory as e.g. "~/userinput.R" and "~/modelinput.R" you can call: ## Not run: climchange.df <- models2wux(userinput = "~/userinput.R", modelinput = "~/modelinput.R") ## End(Not run) ## if you downloaded the data correctly, you should obtain a data.frame: ## Not run: climchange.df ## End(Not run) ## which should be identical to this example data.frame: data(CMIP5_example_changesignal) CMIP5_example_changesignal ## Instead of calculating the climate change signals, you can also ## generate time series of the two models aggregated over the Alpine ## domain, using a different user input file: ## Not run: climchange.df <- models2wux(userinput = userinput_CMIP5_timeseries, modelinput = modelinput_test) ## End(Not run) ## VI) Finally you can make all kind of analysis you are interested in, ## using either functions from wux or from any other R funtionality summary(CMIP5_example_changesignal, parms = "delta.air_temperature") ## or plot timeseries as require(lattice) data(CMIP5_example_timeseries) ## Not run: xyplot(air_temperature ~ year|season, groups = acronym, data = CMIP5_example_timeseries, type = c("l", "g"), main = "NorESM1-M and CanESM2 simulations over Alpine Region\nRCP 8.5 forcing") ## End(Not run)
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.