loadSeasonalForecast: Load a seasonal forecast

View source: R/loadSeasonalForecast.R

loadSeasonalForecastR Documentation

Load a seasonal forecast

Description

Load a user-defined spatio-temporal slice from a seasonal forecast

Usage

loadSeasonalForecast(
  dataset,
  var,
  dictionary = FALSE,
  members = NULL,
  lonLim = NULL,
  latLim = NULL,
  season = NULL,
  years = NULL,
  leadMonth = 1,
  time = "none",
  aggr.d = "none",
  aggr.m = "none"
)

Arguments

dataset

A character string indicating the database to be accessed. This is usually a path to a local file or a URL pointing to a netCDF or NcML file in the case of netCDF and/or gridded datasets. For station data in standard ASCII format, this is the path to the directory the dataset lives in.

var

Variable code (character string). This is the name of the variable according to the R standard naming (see the next argument). For variables with vertical levels, the vertical level is specified next to the variable name followed by the “@” symbol (e.g. var = "z@700" for geopotential heigth at 700 mb isobaric surface pressure level). It is also possible to enter the variable name as originally coded in the dataset to skip data homogenization.

dictionary

Default to FALSE, if TRUE a dictionary is used and the .dic file is stored in the same path than the dataset. If the .dic file is stored elsewhere, then the argument is the full path to the .dic file (including the extension, e.g.: "/path/to/the/dictionary_file.dic"). This is the case for instance when the dataset is stored in a remote URL, and we have a locally stored dictionary for that particular dataset. If FALSE no variable homogenization takes place, and the raw variable, as originally stored in the dataset, will be returned. See details for dictionary specification.

members

A vector of integers indicating the members to be loaded. Default to NULL, which loads all available members . For instance, members=1:5 will retrieve the first five members of dataset. Discontinuous member selections (e.g. members=c(1,5,7)) are allowed. If the requested dataset has no Ensemble Axis or the requested variable is static (e.g. orography), the argument will be ignored.

lonLim

Vector of length = 2, with minimum and maximum longitude coordinates, in decimal degrees, of the bounding box selected. For single-point queries, a numeric value with the longitude coordinate. If NULL (default), the whole longitudinal range is selected (Note that this may lead to a large output object size).

latLim

Same as lonLim, but for the selection of the latitudinal range.

season

An integer vector specifying the desired season (in months, January = 1 ..., December = 12). Options include one to several (contiguous) months. Default to NULL, indicating a full year selection (same as season = 1:12).

years

Optional vector of years to select. Default (NULL) to all available years. If the requested variable is static (e.g. orography) it will be ignored.

leadMonth

Integer value indicating the lead forecast time, relative to the first month of season. Note that leadMonth=1 for season=1 (January) corresponds to the December initialization. Default to 1 (i.e., 1 lead month forecast)..

time

A character vector indicating the temporal filtering/aggregation of the output data. Default to "none", which returns the original time series as stored in the dataset. For sub-daily variables, instantantaneous data at selected verification times can be filtered using one of the character strings "00", "03", "06", "09", "12", "15", "18", "21",and "00" when applicable. If daily aggregated data are required use "DD". If the requested variable is static (e.g. orography) it will be ignored. See the next arguments for time aggregation options.

aggr.d

Character string. Function of aggregation of sub-daily data for daily data calculation. Currently accepted values are "none", "mean", "min", "max" and "sum".

aggr.m

Same as aggr.d, bun indicating the aggregation function to compute monthly from daily data. If aggr.m = "none" (the default), no monthly aggregation is undertaken.

Value

A list with the following components providing the necessary information for data representation and analysis.

  • VariableA list with two elements, and some other attributes including units and temporal aggregation details:

    • varName A character string indicating which is the variable returned. Same as value provided for argument var

    • level A numeric value indicating the vertical level of the variable (NULL for 2D variables)

  • DataA N-dimensional array. The number of dimensions (N) depends on the type of request given that dimensions of length one are dropped. Thus, N can take values from 4 (several members for a rectangular domain with different values for longitude, latitude, ensemble and time dimensions) to 1 (atomic vector), for single-point and single-member selections, for which only the time dimension is required. The dimensions are labelled by the “dimnames” attribute, and are always arranged in canonical order (i.e.: [member, time, level, lat, lon]).

  • xyCoordsA list with x and y components, as required by many standard mapping functions in R (see e.g. image). In addition, the attribute projection provides projection information details.

  • DatesA list with two POSIXct time elements of the same length as the ‘time’ dimension in Data, defining the time boundaries of the time axis coordinates in the interval [start, end), or if the loaded field is static, a character string indicating it. See details.

  • InitializationDatesA POSIXct time object corresponding to the initialization times selected. Only for forecast datasets. NA for static variables (e.g. orography). See details.

  • MembersA character vector with the names of the ensemble members returned, in the same order as arranged in the Data array. Only for forecast datasets. NA for static variables (e.g. orography). See details.

Additionally, there are three global attributes with metadata, ("dataset", which is always present. In addition "source" and "URL" are added for datasets from the User Data Gateway.

Variable harmonization

The different nature of the various databases, models and variables, and the idiosyncratic naming and storage conventions often applied by the different modelling centres, makes necessary a previous harmonization across datasets in order to implement a truly user-friendly toolbox for data access. This package achieves this aim by defining a common vocabulary to all climate datasets. The particular variables of each dataset are translated -and transformed if necessary- to the standard variables by means of a dictionary, provided by the argument dictionary. In essence, the ‘dictionary’ is a csv file particular for each individual dataset, containing the necessary information for performing the unit conversions to match the standard variable definitions contained in the vocabulary (see C4R.vocabulary). This feature is described in more detail in the loadeR wiki..

Temporal filtering and aggregation

The argument time controls the temporal filtering/aggregation options that may apply for a variable. In case of daily mean data, this can be obtained in two different ways:

  1. For variables that are already stored as daily means in the dataset, both "DD" and "none" return the required daily output

  2. In case of subdaily data, if "DD" is chosen, the function will compute the daily value using the aggregation function indicated in the argument aggr.d, printing an information message on screen. This function is normally the "mean" providing daily averages, although if the variable is a flux (e.g. precipitation or radiation, (var = "tp", "rsds" or "rlds" using the standard UDG naming), the aggregation function may be "sum" (i.e., it will return the daily accumulated value). In the same way, if the variable is a daily maximum/minimum (i.e., var = "tasmax" / var = "tasmin"), the corresponding function (aggr.d = "max" or aggr.d = "min") could be applied to the subdaily outputs on a daily basis to obtain absolute maximum/minimum daily values.

Geolocation parameters

Regarding the selection of the spatial domain, it is possible to select the whole spatial domain of the dataset by defining the arguments lonLim=NULL and latLim=NULL. More often, rectangular domains are defined by the minimum and maximum coordinates in longitude and latitude (for instance lonLim=c(-10,10) and latLim=c(35,45) indicates a rectangular window centered in the Iberian Peninsula), and single grid-cell values (for instance lonLim=-3.21 and latLim=41.087 for retrieving the data in the closest grid point to the point coordinate -3.21E, 41.087N. In the last two cases, the function operates by finding the nearest (euclidean distance) grid-points to the coordinates introduced.

In the case of station data (loadStationData), the logic is the same, taking into account that in the case of rectangular domains, all stations falling inside that window will be loaded. For single-point selections, the closest station will be chosen, and a note on-screen will inform about the distance from the selected point to the chosen station.

In case of irregular grids (e.g. the typical RCM rotated pole projections), the regular coordinates are included in the x and y elements of the xyCoords list, while the corresponding geographical coordinates are insode two matrices inside the lon and lat elements.

Deaccumulation

In case of variables that are deaccumulated (e.g. precipitation amount and radiations in System4 models), it must be noted that the original forecast dates correspond to the start of each verification step. Thus, the first value is always zero, and then it starts accumulating. The deaccumulation routine computes a lagged difference between forecast dates (R function diff) to provide the deaccumulated series. Therefore, the first value is always lost. To avoid a systematic loss of the first day, when a deaccumulable variable is requested the function internally loads the previous time step (e.g., season = c(12,1,2) for daily precipitation, the forecast date 30-Nov is also loaded, being the first value of the series -1st December- the difference between 1 december and 30 november in the model). As a result, in leadMonth = 0 requests, the first day of the series is lost, because there is not a previous forecast time in the initialization to start deaccumulating.

Definition of temporal slices

The function has been implemented to access seasonal slices, as determined by the season argument. Seasons can be defined in several ways: A single month (e.g. season=1 for January, a standard season (e.g. season=c(1,2,3) for JFM, or season=c(12,1,2) for DJF), or any period of consecutive months (e.g. season=c(1:6), for the first half of the year). Seasons are returned for a given year period (defined by the years argument, e.g. years=1981:2000) with a homogeneous forecast lead time (as given by the leadMonth argument; e.g. leadMonth=1 for one-month lead time) with respect to the first month of the selected season. For example, season=c(1,2,3) for years=1995:2000 and leadMonth=1 will return the following series: JFM 1995 from the December 1994 runtime forecast, ..., JFM 2000 from the December 1999 runtime forecast.

Year-crossing seasons

It is possible to work with year-crossing seasons, such as DJF. In this case, season=c(12,1,2) for years=1995:2000 and leadMonth=1 will return the following series: DJF 1994/1995 (from the November 1994 runtime forecast), ..., DJF 1999/2000 (from the November 1999 runtime forecast).

Full initialization length

In case the whole year/forecast extent is needed (instead of a particular season), the argument season can be omitted. In this case, its default value is NULL, equivalent to setting season = 1:12, or season = 1:n, being n the remaining number of forecast months since the given lead month in the case of seasonal forecasts . The same applies to the argument years, being all the available years returned when omitted.

Initialization times

The characteristics of the InitializationDates output vary depending on the dataset. In the case of models that have simultaneous initializations for different members (e.g. System4), the output is just a vector of initialization times (one per year selected). Unlike the simultaneous initializations scheme, the lagged runtime configuration of members used by some other models (e.g. CFSv2) results in different initialization times for the same forecast times of different members. In this case, the InitializationDates are included in a list whose elements are named as the corresponding member.

Author(s)

J. Bedia


SantanderMetGroup/loadeR documentation built on July 4, 2023, 4:29 a.m.