loadGridData: Load a grid from a gridded dataset
In SantanderMetGroup/loadeR: A climate4R package for data access <http://meteo.unican.es/climate4r>

loadGridData

R Documentation

Load a grid from a gridded dataset

Description

Load a user-defined spatio-temporal slice from a gridded dataset

Usage

loadGridData(
  dataset,
  var,
  dictionary = FALSE,
  lonLim = NULL,
  latLim = NULL,
  season = NULL,
  years = NULL,
  members = NULL,
  time = "none",
  aggr.d = "none",
  aggr.m = "none",
  condition = NULL,
  threshold = NULL,
  spatialTolerance = NULL
)

Arguments

`dataset`	A character string indicating the database to be accessed. This is usually a path to a local file or a URL pointing to a netCDF or NcML file in the case of netCDF and/or gridded datasets. For station data in standard ASCII format, this is the path to the directory the dataset lives in.
`var`	Variable code (character string). This is the name of the variable according to the R standard naming (see the next argument). For variables with vertical levels, the vertical level is specified next to the variable name followed by the “@” symbol (e.g. `var = "z@700"` for geopotential heigth at 700 mb isobaric surface pressure level). It is also possible to enter the variable name as originally coded in the dataset to skip data homogenization.
`dictionary`	Default to FALSE, if TRUE a dictionary is used and the .dic file is stored in the same path than the dataset. If the .dic file is stored elsewhere, then the argument is the full path to the .dic file (including the extension, e.g.: `"/path/to/the/dictionary_file.dic"`). This is the case for instance when the dataset is stored in a remote URL, and we have a locally stored dictionary for that particular dataset. If FALSE no variable homogenization takes place, and the raw variable, as originally stored in the dataset, will be returned. See details for dictionary specification.
`lonLim`	Vector of length = 2, with minimum and maximum longitude coordinates, in decimal degrees, of the bounding box selected. For single-point queries, a numeric value with the longitude coordinate. If `NULL` (default), the whole longitudinal range is selected (Note that this may lead to a large output object size).
`latLim`	Same as `lonLim`, but for the selection of the latitudinal range.
`season`	An integer vector specifying the desired season (in months, January = 1 ..., December = 12). Options include one to several (contiguous) months. Default to `NULL`, indicating a full year selection (same as `season = 1:12`).
`years`	Optional vector of years to select. Default (`NULL`) to all available years. If the requested variable is static (e.g. orography) it will be ignored.
`members`	A vector of integers indicating the members to be loaded. Default to `NULL`, which loads all available members if the dataset contains members (i.e. in case a Ensemble Axis is defined). For instance, `members=1:5` will retrieve the first five members of dataset. Note that unlike `loadSeasonalForecast`, discontinuous member selections (e.g. `members=c(1,5,7)`) are NOT allowed. If the requested dataset has no Ensemble Axis (or it is a static field, e.g. orography) it will be ignored.
`time`	A character vector indicating the temporal filtering/aggregation of the output data. Default to `"none"`, which returns the original time series as stored in the dataset. For sub-daily variables, instantantaneous data at selected verification times can be filtered using one of the character strings `"00"`, `"03"`, `"06"`, `"09"`, `"12"`, `"15"`, `"18"`, `"21"`,and `"00"` when applicable. If daily aggregated data are required use `"DD"`. If the requested variable is static (e.g. orography) it will be ignored. See the next arguments for time aggregation options.
`aggr.d`	Character string. Function of aggregation of sub-daily data for daily data calculation. Currently accepted values are `"none"`, `"mean"`, `"min"`, `"max"` and `"sum"`.
`aggr.m`	Same as `aggr.d`, bun indicating the aggregation function to compute monthly from daily data. If `aggr.m = "none"` (the default), no monthly aggregation is undertaken.
`condition`	Optional, only needed if absolute/relative frequencies are required. Inequality operator to be applied considering the given threshold. `"GT"` = greater than the value of `threshold`, `"GE"` = greater or equal, `"LT"` = lower than, `"LE"` = lower or equal than
`threshold`	Optional, only needed if absolute/relative frequencies are required. A float number defining the threshold used by `condition` (the next argument).
`spatialTolerance`	Numeric. The use of this argument is NOT RECOMMENDED. Distance (in grid coordinate units) out of the lonLim and LatLim ranges that is allowed for data retrieving.

Value

A list with the following elements providing the necessary information for data representation and analysis:

Variable

A list with three elements:

varName A character string indicating which is the variable returned. Same as value provided for argument var
isStandard Logical value indicating whether the variable returned is standard or not (i.e., wether the dictionary has been used or not.)
level A numeric value indicating the vertical level of the variable (NULL for 2D variables)

`Data`	A N-dimensional array. The number of dimensions (N) depends on the type of request given that dimensions of length one are dropped. Thus, N can take values from 4 (several members for a rectangular domain with different values for longitude, latitude, ensemble and time dimensions) to 1 (atomic vector), for single-point and single-member selections, for which only the time dimension is required. The dimensions are labelled by the “dimnames” attribute. Note that the order of the dimensions is not fixed.
`xyCoords`	A list with `x` and `y` components, as required by many standard mapping functions in R (see `xy.coords`. In addition, the attribute `projection` provides geo-referencing information as stored in the original dataset.
`Dates`	A list with two `POSIXct` time elements of the same length as the ‘time’ dimension in `Data`, defining the time boundaries of the time axis coordinates in the interval [start, end), or if the loaded field is static, a character string indicating it.

Variable harmonization

The different nature of the various databases, models and variables, and the idiosyncratic naming and storage conventions often applied by the different modelling centres, makes necessary a previous harmonization across datasets in order to implement a truly user-friendly toolbox for data access. This package achieves this aim by defining a common vocabulary to all climate datasets. The particular variables of each dataset are translated -and transformed if necessary- to the standard variables by means of a dictionary, provided by the argument dictionary. In essence, the ‘dictionary’ is a csv file particular for each individual dataset, containing the necessary information for performing the unit conversions to match the standard variable definitions contained in the vocabulary (see C4R.vocabulary). This feature is described in more detail in the loadeR wiki..

Temporal filtering and aggregation

The argument time controls the temporal filtering/aggregation options that may apply for a variable. In case of daily mean data, this can be obtained in two different ways:

For variables that are already stored as daily means in the dataset, both "DD" and "none" return the required daily output
In case of subdaily data, if "DD" is chosen, the function will compute the daily value using the aggregation function indicated in the argument aggr.d, printing an information message on screen. This function is normally the "mean" providing daily averages, although if the variable is a flux (e.g. precipitation or radiation, (var = "tp", "rsds" or "rlds" using the standard UDG naming), the aggregation function may be "sum" (i.e., it will return the daily accumulated value). In the same way, if the variable is a daily maximum/minimum (i.e., var = "tasmax" / var = "tasmin"), the corresponding function (aggr.d = "max" or aggr.d = "min") could be applied to the subdaily outputs on a daily basis to obtain absolute maximum/minimum daily values.

Geolocation parameters

Regarding the selection of the spatial domain, it is possible to select the whole spatial domain of the dataset by defining the arguments lonLim=NULL and latLim=NULL. More often, rectangular domains are defined by the minimum and maximum coordinates in longitude and latitude (for instance lonLim=c(-10,10) and latLim=c(35,45) indicates a rectangular window centered in the Iberian Peninsula), and single grid-cell values (for instance lonLim=-3.21 and latLim=41.087 for retrieving the data in the closest grid point to the point coordinate -3.21E, 41.087N. In the last two cases, the function operates by finding the nearest (euclidean distance) grid-points to the coordinates introduced.

In the case of station data (loadStationData), the logic is the same, taking into account that in the case of rectangular domains, all stations falling inside that window will be loaded. For single-point selections, the closest station will be chosen, and a note on-screen will inform about the distance from the selected point to the chosen station.

In case of irregular grids (e.g. the typical RCM rotated pole projections), the regular coordinates are included in the x and y elements of the xyCoords list, while the corresponding geographical coordinates are insode two matrices inside the lon and lat elements.

Author(s)

J. Bedia, S. Herrera, M. Iturbide, J.M. Gutierrez

Examples

## Not run: 
#Download dataset
dir.create("mydirectory")
download.file("http://meteo.unican.es/work/loadeR/data/Iberia_NCEP.tar.gz", 
destfile = "mydirectory/Iberia_NCEP.tar.gz")
# Extract files from the tar.gz file
untar("mydirectory/NCEP_Iberia.tar.gz", exdir = "mydirectory")
# First, the path to the ncml file is defined:
ncep <- "mydirectory/Iberia_NCEP/Iberia_NCEP.ncml"
# Load air temperature at 850 millibar isobaric surface pressure level from the built-in
# NCEP dataset, for the Iberian Peninsula in summer (JJA):
grid <- loadGridData(ncep, var = "ta@850", dictionary = TRUE, lonLim = c(-10,5),
   latLim = c(35.5, 44.5), season = 6:8, years = 1981:2010)
str(grid)   
# Calculation of monthly mean temperature:
grid.mm <- loadGridData(ncep, var = "ta@850", dictionary = TRUE, lonLim = c(-10,5),
                         latLim = c(35.5, 44.5), season = 6:8,
                         years = 1981:2010, aggr.m = "mean")
str(grid.mm)

# Same but using the original variable (not harmonized via dictionary):
di <- dataInventory(ncep)
names(di)
climate4R.UDG::C4R.vocabulary()
# Variable is named 'T', instead of the standard name 'ta' in the vocabulary
# Vertical level is indicated using the '@' symbol:
non.standard.grid <- loadGridData(ncep, var = "T@850", dictionary = FALSE, lonLim = c(-10,5),
                                  latLim = c(35.5, 44.5), season = 6:8, 
                                  years = 1981:2010, aggr.m = "mean")
str(non.standard.grid$Variable)
# Note the units are now in Kelvin, as originally stored
## Example of data load from a remote repository via OPeNDAP (NASA dataserver)
ds <- "http://dataserver3.nccs.nasa.gov/thredds/dodsC/bypass/NEX-GDDP/bcsd/rcp85/r1i1p1/tasmax/MIROC-ESM.ncml"
# Monthly mean maximum summer 2m temperature at 12:00 UTC over the Iberian Peninsula:
# (CMIP5 MIROC-ESM model, RCP 8.5)
tasmax <- loadGridData(dataset = ds,
                       var = "tasmax",
                       lonLim = c(-10,5),
                       latLim = c(35,44),
                       season = 6:8,
                       years = 2021,
                       time = "12",
                       aggr.m = "mean")


## Example loading frequency data based on threshold exceedances:
# This example will use remote data from the EOBS dataset loaded from the KNMI's OPenDAP server
# Note that the URL of the dataset is not persistent, and changes with updated versions,
# so this example is not guaranteed to work in the future. You can check the latest dataset
# version and its corresponding URL at the following link:
# <http://www.ecad.eu/download/ensembles/download.php>
require(transformeR)
require(visualizeR)
ds <- "http://opendap.knmi.nl/knmi/thredds/dodsC/e-obs_0.50regular/tx_0.50deg_reg_v16.0.nc"
# The following call will load the annual number of days above 25 degrees C
# (Number of summer days, 'SU' according to ETCCDI/CRD climate change indices
# <http://etccdi.pacificclimate.org/list_27_indices.shtml>)
tx25 <- loadGridData(dataset = ds,
                    var = "tx",
                    lonLim = c(-10,5),
                    latLim = c(35,44),
                    season = 1:12,
                    years = 1981:1990,
                    aggr.m = "sum", 
                    threshold = 25,
                    condition = "GT")

# Note the use of threshold = 25, and condition = "GT" (i.e. strictly Greater Than)
# In the next lines the data are annualy aggregated and the correposnding climatological
# map is displayed:

tx25.annual <- aggregateGrid(tx25, aggr.y = list(FUN = "sum"))
spatialPlot(climatology(tx25.annual), main = "Number of Summer Days 1981-1990")

# Note that the relative frequency (i.e., proportion of days instead of absolute frequency),
# can be obtained by just indicating the argument 'aggr.m = "mean"'

## End(Not run)

SantanderMetGroup/loadeR documentation built on June 7, 2024, 8:16 p.m.