loadECOMS: Remote access to climate databases stored at the ECOMS-UDG

Description Usage Arguments Details Value Author(s) References Examples


A simple interface for accesing and retrieving dimensional slices of the various climate databases stored at the ECOMS User Data Gateway (Forecasts, gridded observations and reanalysis).


loadECOMS(dataset, var, dictionary = TRUE, members = NULL, lonLim = NULL, 
      latLim = NULL, season = NULL, years = NULL, leadMonth = 1,
      time = "none", aggr.d = "none", aggr.m = "none") 



A character string indicating the database to be accessed (partial matching is enabled). Currently accepted values are "System4_seasonal_15", "System4_seasonal_51", "System4_annual_15", "CFSv2_seasonal" and "SMHI-EC-EARTH_EUPORIAS" for hindcasts, "WFDEI" for the WATCH Forcing Dataset based on ERA-Interim (gridded observations) and "ERA_interim" and "NCEP_reanalysis1" for the ECMWF Interim and NCEP/NCAR reanalyses respectively. See details on available datasets.


Variable code (character string). This is the name of the variable according to the R standard naming (see the next argument). For variables with vertical levels, the vertical level is specified next to the variable name followed by the “@” symbol (e.g. var = "[email protected]" for geopotential at 700 mb isobaric surface pressure level). It is also possible to enter the variable name as originally coded in the dataset to skip data homogenization, although this is not recommended (see the next argument). See details on available variables.


A logical flag indicating if a dictionary is used for variable homogenization. Default (strongly recommended) is set to TRUE, meaning that the function will internally perform the necessary homogenization steps to return the standard variables defined in the vocabulary (e.g. variable transformation, deaccumulation...). See details on data homogenization.


A vector of integers indicating the members to be loaded. Default to NULL, which loads the default members (see details for the particularities concerning the CFSv2 dataset). For instance, members=1:5 will retrieve the first five members of the hindcast. Discontinuous member selection (e.g. members=c(1,5,7)) is allowed. If the requested dataset is not a forecast or the requested variable is static (e.g. orography) it will be ignored.


Vector of length = 2, with the minimum and maximum longitude coordinates, in decimal degrees, of the bounding box selected. For single-point queries, a numeric value with the longitude coordinate. If NULL (default), the whole longitudinal range is selected (Note that this may lead to a large output object size). See details on the definition of spatial domains.


Same as lonLim, but for the selection of the latitudinal range.


An integer vector specifying the desired season (in months, January = 1 ..., December = 12). Options include one to several (contiguous) months. For full year selections (not possible for all datasets, e.g. seasonal forecasts), the argument value must be set to season = 1:12. If the requested variable is static (e.g. orography) it will be ignored. See details on the definition of temporal slices.


Optional vector of years to select. Default to all available years. If the requested variable is static (e.g. orography) it will be ignored. See details on the definition of temporal slices.


Integer value indicating the lead forecast time, relative to the first month of season. Note that leadMonth=1 for season=1 (January) corresponds to the December initialization. Default to 1 (i.e., 1 lead month forecast). If the dataset is not a forecast or the requested variable is static (e.g. orography) it will be ignored. A message will be printed on screen in the first case if its value is different from NULL. See details on initialization times.


A character vector indicating the temporal filtering/aggregation of the output data. Default to "none", which returns the original time series as stored in the dataset. For sub-daily variables, instantantaneous data at selected verification times can be filtered using one of the character strings "00", "06", "12" and "18". If daily aggregated data are required use "DD". If the requested variable is static (e.g. orography) it will be ignored. See details for time aggregation options.


A character string indicating the temporal aggregation function to be applied in case of daily aggregation of sub-daily data (when time = "DD" and the original data is sub-daily, otherwise ignored). Currently accepted values are "none" (default), "mean", "min", "max" and "sum". See details for time aggregation options.


Same as argument aggr.d but for monthly aggregation of data. It requires the specification of the daily aggregation in case of subdaily data. See details for time aggregation options.


Available datasets

The values of the argument dataset are consistent with the nomenclature of the reference table containing a summary of all available datasets and variables: http://meteo.unican.es/trac/wiki/udg/ecoms/dataserver/catalog. Currently, there are 5 different seasonal to annual hindcasts, two reanalysis products and one observational gridded dataset available at ECOMS-UDG. All of them are available through the common interface loadECOMS, and therefore the argument values may vary slightly. For instance, arguments members and leadMonth do not apply in the case of observations/reanalysis, and are therefore ignored if their value is not NULL. Similarly, the output structure may vary consequently, and forecast data types include the initialization dates and the names of the chosen members, while this information is not included for other types of gridded data.

Available variables

For the possible values that the argument var can take for each dataset at ECOMS-UDG, check the first column of the variables table, that is continuously updated as new variables are made available. The table also contains further details regarding the native temporal aggregation/resolution of each variable and the available vertical levels.

Variable homogeneization

The different nature of the various databases, models and variables, and the idiosyncratic naming and storage conventions often applied by the different modelling centres, makes necessary a previous homogeneization across datasets in order to implement a truly user-friendly toolbox for data access. This package achieves this aim by defining a common vocabulary to all climate datasets. The particular variables of each dataset are translated -and transformed if necessary- to the standard variables by means of a dictionary, provided by the argument dictionary. In essence, the ‘dictionary’ is a csv file particular for each individual dataset, containing the necessary information for performing the unit conversions to match the standard variable definitions contained in the vocabulary. This feature is described in more detail in the data homogenization section of the ECOMS UDG wiki.

Ensemble member definition

In the case of the CFSv2 reforecast there are four initializations (4 cycles) from every 5th day running for 9 months (see CFSv2 members for more detailed information of members' construction). Thus, the lagged-time configuration of members results in a different number of possible members depending on the initialization chosen (more precisely, 24 members excepting the November initializations, which have 28). This theoretical configuration has been slightly modified to avoid some missing initializations in the original dataset (see details in the previous link). For better comparability with its ECMWF's counterpart (the System4 seasonal forecast of 15 members), loadECOMS defines by default an ensemble of 15 members for each lead month and forecast season in the case of CFS, although it is possible to request all available members for that particular lead month. This way, all default members are ensured to belong to the antecedent month's initializations or first days of the current month. In addition, due to the lagged runtime configuration of the ensemble members in CFSv2, in case of lead month 0 requests, some of the first days of the period may be missing, as only common days for all members requested are returned.

Definition of spatial domains

Regarding the selection of the spatial domain, it is possible to select the whole spatial domain of the datasets (currently global for all hindcasts available). In this case lonLim=NULL and latLim=NULL. More often, rectangular domains are defined by the minimum and maximum coordinates in longitude and latitude (for instance lonLim=c(-10,10) and latLim=c(35,45) indicates a rectangular window centered in the Iberian Peninsula), and single grid-cell values (for instance lonLim=-3.21 and latLim=41.087 for retrieving the data in the closest grid point to the point coordinate -3.21E, 41.087N. In the last two cases, the function operates by finding the nearest (euclidean distance) grid-points to the coordinates introduced. (NOTE: Currently the single-point option is disabled for the NCEP dataset, that only accepts rectangular domain selections.)

The returned value by xyCoords varies accordingly, and it fits the common data structure of many R plotting functions (see xy.coords for obtaining a more detailed info).

The spatial definition of the data is associated to a specific coordinate reference system via the ‘proj4string’ slot of xyCoords, thus enabling the direct application of geospatial operations such as projection transformations, spatial overlay methods etc with the appropriate R methods.

Definition of temporal slices

The function has been implemented to access seasonal slices, as determined by the season argument. Seasons can be defined in several ways: A single month (e.g. season=1 for January, a standard season (e.g. season=c(1,2,3) for JFM, or season=c(12,1,2) for DJF), or any period of consecutive months (e.g. season=c(1:6), for the first half of the year). Seasons are returned for a given year period (defined by the years argument, e.g. years=1981:2000) with a homogeneous forecast lead time (as given by the leadMonth argument; e.g. leadMonth=1 for one-month lead time) with respect to the first month of the selected season. For example, season=c(1,2,3) for years=1995:2000 and leadMonth=1 will return the following series: JFM 1995 from the December 1994 runtime forecast, ..., JFM 2000 from the December 1999 runtime forecast. Note that it is also possible to work with year-crossing seasons, such as DJF. In this case, season=c(12,1,2) for years=1995:2000 and leadMonth=1 will return the following series: DJF 1994/1995 (from the November 1994 runtime forecast), ..., DJF 1999/2000 (from the November 1999 runtime forecast).

In case the whole year/forecast extent is needed (instead of a particular season), the argument season can be omitted. In this case, its default value is NULL, equivalent to setting season = 1:12, or season = 1:n, being n the remaining number of forecast months since the given lead month in the case of seasonal forecasts. The same applies to the argument years, being all the available years returned when omitted.

Note that some forecasts (e.g. System4) do not provide data for the first forecast time of precipitation. Thus, for lead month 0 queries, the data for this particular dataset begin on the second day of the month.

Initialization times

The characteristics of the InitializationDates output vary depending on the dataset. In the case of models that have simultaneous initializations for different members (e.g. System4), the output is just a vector of initialization times (one per year selected). Unlike the simultaneous initializations scheme, the lagged runtime configuration of members used by some other models (e.g. CFSv2) results in different initialization times for the same forecast times of different members. In this case, the InitializationDates are included in a list whose elements are named as the corresponding member.

By default, 15 members are returned for the CFSv2 hindcast for better comparability with the 15 members returned by the ECMWF's System4 seasonal hindcast. However, note that up to 24 members can be obtained from the CFSv2 hindcast (i.e., members=1:24), and 28 in the case of the November initializations, although some modifications to this initial configuration have been introduced to avois errors stemming from missing initializations for some years in the original database. See the CFSv2 member definition at the ECOMS-UDG wiki for more details on the lagged runtimes configuration of the CFSv2 hindcast and the ecomsUDG.Rpackage approach.

Temporal filtering / aggregation

The argument time controls the temporal filtering/aggregation options that may apply for a variable. In case of daily mean data, this can be obtained in two different ways:

  1. For variables that are already stored as daily means in the dataset, both "DD" and "none" return the required daily output

  2. In case of 6-hourly data, if "DD" is chosen, the function will compute the daily value using the aggregation function indicated in the argument aggr.d, printing an information message on screen. This function is normally the "mean" providing daily averages, although if the variable is a 6-h flux (e.g. precipitation or radiation, (var = "tp", "rsds" or "rlds"), the aggregation function should be probably "sum" (i.e., it will return the daily accumulated value). In the same way, if the variable is a daily maximum/minimum (i.e., var = "tasmax"/var = "tasmin"), the corresponding function (aggr.d = "max" or aggr.d = "min") could be applied to the 6-h outputs on a daily basis to obtain absolute maximum/minimum daily values.

  3. In case of 12-hourly data, daily mean data ("DD") is possible, but the function will give a warning, as in general it is not a recommended practice to compute daily means using just two values. However, it is possible to do this as it is a necessary step prior to monthly aggregation. Obviously, in case of 12-h data time argument values different from "none", "DD", "00" or "12" won't be accepted.

  4. For monthly aggregations, the process is similar to daily data. It must be beared in mind that in this case, the daily aggregation must be previously indicated for sub-daily variables, prior to apply the monthly aggregation function.


A list with the following elements providing the necessary information for data representation and analysis:


A list with two elements, and some other attributes including units and temporal aggregation details:


A N-dimensional array. The number of dimensions (N) depends on the type of request given that dimensions of length one are dropped. Thus, N can take values from 4 (several members for a rectangular domain with different values for longitude, latitude, ensemble and time dimensions) to 1 (atomic vector), for single-point and single-member selections, for which only the time dimension is required. The dimensions are labelled by the “dimnames” attribute, and are always arranged in canonical order (i.e.: [member, time, level, lat, lon]).


A list with x and y components, as required by many standard mapping functions in R (see e.g. image. In addition, the attribute projection provides geo-referencing information for more advanced spatial operations/conversions, in the form of a character string following the PROJ.4 specifications.


A list with two POSIXct time elements of the same length as the ‘time’ dimension in Data, defining the time boundaries of the time axis coordinates in the interval [start, end), or if the loaded field is static, a character string indicating it. See details.


A POSIXct time object corresponding to the initialization times selected. Only for forecast datasets. NA for static variables (e.g. orography). See details.


A character vector with the names of the ensemble members returned, in the same order as arranged in the Data array. Only for forecast datasets. NA for static variables (e.g. orography). See details.

Additionally, there are three global attributes with metadata, ("dataset", "source" and "URL"), providing information on the dataset loaded, their source (the ECOMS UDG) and the URL for reference.


Santander Meteorology Group




## Not run: 
# Go to <http://meteo.unican.es/trac/wiki/udg/ecoms/RPackage/examples>

## End(Not run)

SantanderMetGroup/ecomsUDG.Raccess documentation built on May 9, 2019, 12:41 p.m.