Load | R Documentation |
This function loads monthly or daily data from a set of specified
experimental datasets together with data that date-corresponds from a set
of specified observational datasets. See parameters 'storefreq',
'sampleperiod', 'exp' and 'obs'.
A set of starting dates is specified through the parameter 'sdates'. Data of
each starting date is loaded for each model.
Load()
arranges the data in two arrays with a similar format both
with the following dimensions:
The number of experimental datasets determined by the user through the argument 'exp' (for the experimental data array) or the number of observational datasets available for validation (for the observational array) determined as well by the user through the argument 'obs'.
The greatest number of members across all experiments (in the experimental data array) or across all observational datasets (in the observational data array).
The number of starting dates determined by the user through the 'sdates' argument.
The greatest number of lead-times.
The number of latitudes of the selected zone.
The number of longitudes of the selected zone.
Dimensions 5 and 6 are optional and their presence depends on the type of
the specified variable (global mean or 2-dimensional) and on the selected
output type (area averaged time series, latitude averaged time series,
longitude averaged time series or 2-dimensional time series).
In the case of loading an area average the dimensions of the arrays will be
only the first 4.
Only a specified variable is loaded from each experiment at each starting
date. See parameter 'var'.
Afterwards, observational data that matches every starting date and lead-time
of every experimental dataset is fetched in the file system (so, if two
predictions at two different start dates overlap, some observational values
will be loaded and kept in memory more than once).
If no data is found in the file system for an experimental or observational
array point it is filled with an NA value.
If the specified output is 2-dimensional or latitude- or longitude-averaged
time series all the data is interpolated into a common grid. If the
specified output type is area averaged time series the data is averaged on
the individual grid of each dataset but can also be averaged after
interpolating into a common grid. See parameters 'grid' and 'method'.
Once the two arrays are filled by calling this function, other functions in
the s2dv package that receive as inputs data formatted in this
data structure can be executed (e.g: Clim()
to compute climatologies,
Ano()
to compute anomalies, ...).
Load() has many additional parameters to disable values and trim dimensions
of selected variable, even masks can be applied to 2-dimensional variables.
See parameters 'nmember', 'nmemberobs', 'nleadtime', 'leadtimemin',
'leadtimemax', 'sampleperiod', 'lonmin', 'lonmax', 'latmin', 'latmax',
'maskmod', 'maskobs', 'varmin', 'varmax'.
The parameters 'exp' and 'obs' can take various forms. The most direct form
is a list of lists, where each sub-list has the component 'path' associated
to a character string with a pattern of the path to the files of a dataset
to be loaded. These patterns can contain wildcards and tags that will be
replaced automatically by Load()
with the specified starting dates,
member numbers, variable name, etc.
See parameter 'exp' or 'obs' for details.
Only NetCDF files are supported. OPeNDAP URLs to NetCDF files are also
supported.
Load()
can load 2-dimensional or global mean variables in any of the
following formats:
experiments:
file per ensemble per starting date (YYYY, MM and DD somewhere in the path)
file per member per starting date
(YYYY, MM, DD and MemberNumber somewhere in the path. Ensemble
experiments with different numbers of members can be loaded in
a single Load()
call.)
(YYYY, MM and DD specify the starting dates of the predictions)
observations:
file per ensemble per month (YYYY and MM somewhere in the path)
file per member per month (YYYY, MM and MemberNumber somewhere in the path, obs with different numbers of members supported)
file per dataset (No constraints in the path but the time axes in the file have to be properly defined)
(YYYY and MM correspond to the actual month data in the file)
In all the formats the data can be stored in a daily or monthly frequency,
or a multiple of these (see parameters 'storefreq' and 'sampleperiod').
All the data files must contain the target variable defined over time and
potentially over members, latitude and longitude dimensions in any order,
time being the record dimension.
In the case of a two-dimensional variable, the variables longitude and
latitude must be defined inside the data file too and must have the same
names as the dimension for longitudes and latitudes respectively.
The names of these dimensions (and longitude and latitude variables) and the
name for the members dimension are expected to be 'longitude', 'latitude'
and 'ensemble' respectively. However, these names can be adjusted with the
parameter 'dimnames' or can be configured in the configuration file (read
below in parameters 'exp', 'obs' or see ?ConfigFileOpen
for more information.
All the data files are expected to have numeric values representable with
32 bits. Be aware when choosing the fill values or infinite values in the
datasets to load.
The Load() function returns a named list following a structure similar to
the used in the package 'downscaleR'.
The components are the following:
'mod' is the array that contains the experimental data. It has the attribute 'dimensions' associated to a vector of strings with the labels of each dimension of the array, in order.
'obs' is the array that contains the observational data. It has the attribute 'dimensions' associated to a vector of strings with the labels of each dimension of the array, in order.
'obs' is the array that contains the observational data.
'lat' and 'lon' are the latitudes and longitudes of the grid into
which the data is interpolated (0 if the loaded variable is a global
mean or the output is an area average).
Both have the attribute 'cdo_grid_des' associated with a character
string with the name of the common grid of the data, following the CDO
naming conventions for grids.
The attribute 'projection' is kept for compatibility with 'downscaleR'.
'Variable' has the following components:
'varName', with the short name of the loaded variable as specified in the parameter 'var'.
'level', with information on the pressure level of the variable. Is kept to NULL by now.
And the following attributes:
'is_standard', kept for compatibility with 'downscaleR', tells if a dataset has been homogenized to standards with 'downscaleR' catalogs.
'units', a character string with the units of measure of the variable, as found in the source files.
'longname', a character string with the long name of the variable, as found in the source files.
'daily_agg_cellfun', 'monthly_agg_cellfun', 'verification_time', kept for compatibility with 'downscaleR'.
'Datasets' has the following components:
'exp', a named list where the names are the identifying character strings of each experiment in 'exp', each associated to a list with the following components:
'members', a list with the names of the members of the dataset.
'source', a path or URL to the source of the dataset.
'obs', similar to 'exp' but for observational datasets.
'Dates', with the follwing components:
'start', an array of dimensions (sdate, time) with the POSIX initial date of each forecast time of each starting date.
'end', an array of dimensions (sdate, time) with the POSIX final date of each forecast time of each starting date.
'InitializationDates', a vector of starting dates as specified in 'sdates', in POSIX format.
'when', a time stamp of the date the Load()
call to obtain
the data was issued.
'source_files', a vector of character strings with complete paths
to all the found files involved in the Load()
call.
'not_found_files', a vector of character strings with complete
paths to not found files involved in the Load()
call.
Load(
var,
exp = NULL,
obs = NULL,
sdates,
nmember = NULL,
nmemberobs = NULL,
nleadtime = NULL,
leadtimemin = 1,
leadtimemax = NULL,
storefreq = "monthly",
sampleperiod = 1,
lonmin = 0,
lonmax = 360,
latmin = -90,
latmax = 90,
output = "areave",
method = "conservative",
grid = NULL,
maskmod = vector("list", 15),
maskobs = vector("list", 15),
configfile = NULL,
varmin = NULL,
varmax = NULL,
silent = FALSE,
nprocs = NULL,
dimnames = NULL,
remapcells = 2,
path_glob_permissive = "partial"
)
var |
Short name of the variable to load. It should coincide with the
variable name inside the data files. |
exp |
Parameter to specify which experimental datasets to load data
from.
The tag $START_DATES$ will be replaced with all the starting dates
specified in 'sdates'. $YEAR$, $MONTH$ and $DAY$ will take a value for each
iteration over 'sdates', simply these are the same as $START_DATE$ but
split in parts. list( list( name = 'experimentA', path = file.path('/path/to/$DATASET_NAME$/$STORE_FREQ$', '$VAR_NAME$$SUFFIX$', '$VAR_NAME$_$START_DATE$.nc'), nc_var_name = '$VAR_NAME$', suffix = '_3hourly', var_min = '-1e19', var_max = '1e19' ) ) This will make |
obs |
Argument with the same format as parameter 'exp'. See details on
parameter 'exp'. |
sdates |
Vector of starting dates of the experimental runs to be loaded
following the pattern 'YYYYMMDD'. |
nmember |
Vector with the numbers of members to load from the specified
experimental datasets in 'exp'. |
nmemberobs |
Vector with the numbers of members to load from the
specified observational datasets in 'obs'. |
nleadtime |
Deprecated. See parameter 'leadtimemax'. |
leadtimemin |
Only lead-times higher or equal to 'leadtimemin' are loaded. Takes by default value 1. |
leadtimemax |
Only lead-times lower or equal to 'leadtimemax' are loaded.
Takes by default the number of lead-times of the first experimental
dataset in 'exp'. |
storefreq |
Frequency at which the data to be loaded is stored in the
file system. Can take values 'monthly' or 'daily'. |
sampleperiod |
To load only a subset between 'leadtimemin' and
'leadtimemax' with the period of subsampling 'sampleperiod'. |
lonmin |
If a 2-dimensional variable is loaded, values at longitudes
lower than 'lonmin' aren't loaded. |
lonmax |
If a 2-dimensional variable is loaded, values at longitudes
higher than 'lonmax' aren't loaded. |
latmin |
If a 2-dimensional variable is loaded, values at latitudes
lower than 'latmin' aren't loaded. |
latmax |
If a 2-dimensional variable is loaded, values at latitudes
higher than 'latmax' aren't loaded. |
output |
This parameter determines the format in which the data is
arranged in the output arrays.
Takes by default the value 'areave'. If the variable specified in 'var' is
a global mean, this parameter is forced to 'areave'. |
method |
This parameter determines the interpolation method to be used
when regridding data (see 'output'). Can take values 'bilinear', 'bicubic',
'conservative', 'distance-weighted'. |
grid |
A common grid can be specified through the parameter 'grid' when
loading 2-dimensional data. Data is then interpolated onto this grid
whichever 'output' type is specified. If the selected output type is
'areave' and a 'grid' is specified, the area averages are calculated after
interpolating to the specified grid. |
maskmod |
List of masks to be applied to the data of each experimental
dataset respectively, if a 2-dimensional variable is specified in 'var'. |
maskobs |
See help on parameter 'maskmod'. |
configfile |
Path to the s2dv configuration file from which
to retrieve information on location in file system (and other) of datasets. |
varmin |
Loaded experimental and observational data values smaller
than 'varmin' will be disabled (replaced by NA values). |
varmax |
Loaded experimental and observational data values greater
than 'varmax' will be disabled (replaced by NA values). |
silent |
Parameter to show (FALSE) or hide (TRUE) information messages. |
nprocs |
Number of parallel processes created to perform the fetch
and computation of data. |
dimnames |
Named list where the name of each element is a generic
name of the expected dimensions inside the NetCDF files. These generic
names are 'lon', 'lat' and 'member'. 'time' is not needed because it's
detected automatically by discard. |
remapcells |
When loading a 2-dimensional variable, spatial subsets can
be requested via |
path_glob_permissive |
In some cases, when specifying a path pattern
(either in the parameters 'exp'/'obs' or in a configuration file) one can
specify path patterns that contain shell globbing expressions. Too much
freedom in putting globbing expressions in the path patterns can be
dangerous and make |
The two output matrices have between 2 and 6 dimensions:
Number of experimental/observational datasets.
Number of members.
Number of startdates.
Number of leadtimes.
Number of latitudes (optional).
Number of longitudes (optional).
but the two matrices have the same number of dimensions and only the first two dimensions can have different lengths depending on the input arguments. For a detailed explanation of the process, read the documentation attached to the package or check the comments in the code.
Load()
returns a named list following a structure similar to the
used in the package 'downscaleR'.
The components are the following:
'mod' is the array that contains the experimental data. It has the
attribute 'dimensions' associated to a vector of strings with the
labels of each dimension of the array, in order. The order of the
latitudes is always forced to be from 90 to -90 whereas the order of
the longitudes is kept as in the original files (if possible). The
longitude values provided in lon
lower than 0 are added 360
(but still kept in the original order). In some cases, however, if
multiple data sets are loaded in longitude-latitude mode, the
longitudes (and also the data arrays in mod
and obs
) are
re-ordered afterwards by Load()
to range from 0 to 360; a
warning is given in such cases. The longitude and latitude of the
center of the grid cell that corresponds to the value [j, i] in 'mod'
(along the dimensions latitude and longitude, respectively) can be
found in the outputs lon
[i] and lat
[j]
'obs' is the array that contains the observational data. The same documentation of parameter 'mod' applies to this parameter.
'lat' and 'lon' are the latitudes and longitudes of the centers of
the cells of the grid the data is interpolated into (0 if the loaded
variable is a global mean or the output is an area average).
Both have the attribute 'cdo_grid_des' associated with a character
string with the name of the common grid of the data, following the CDO
naming conventions for grids.
'lon' has the attributes 'first_lon' and 'last_lon', with the first
and last longitude values found in the region defined by 'lonmin' and
'lonmax'. 'lat' has also the equivalent attributes 'first_lat' and
'last_lat'.
'lon' has also the attribute 'data_across_gw' which tells whether the
requested region via 'lonmin', 'lonmax', 'latmin', 'latmax' goes across
the Greenwich meridian. As explained in the documentation of the
parameter 'mod', the loaded data array is kept in the same order as in
the original files when possible: this means that, in some cases, even
if the data goes across the Greenwich, the data array may not go
across the Greenwich. The attribute 'array_across_gw' tells whether
the array actually goes across the Greenwich. E.g: The longitudes in
the data files are defined to be from 0 to 360. The requested
longitudes are from -80 to 40. The original order is kept, hence the
longitudes in the array will be ordered as follows:
0, ..., 40, 280, ..., 360. In that case, 'data_across_gw' will be TRUE
and 'array_across_gw' will be FALSE.
The attribute 'projection' is kept for compatibility with 'downscaleR'.
'Variable' has the following components:
'varName', with the short name of the loaded variable as specified in the parameter 'var'.
'level', with information on the pressure level of the variable. Is kept to NULL by now.
And the following attributes:
'is_standard', kept for compatibility with 'downscaleR', tells if a dataset has been homogenized to standards with 'downscaleR' catalogs.
'units', a character string with the units of measure of the variable, as found in the source files.
'longname', a character string with the long name of the variable, as found in the source files.
'daily_agg_cellfun', 'monthly_agg_cellfun', 'verification_time', kept for compatibility with 'downscaleR'.
'Datasets' has the following components:
'exp', a named list where the names are the identifying character strings of each experiment in 'exp', each associated to a list with the following components:
'members', a list with the names of the members of the dataset.
'source', a path or URL to the source of the dataset.
'obs', similar to 'exp' but for observational datasets.
'Dates', with the follwing components:
'start', an array of dimensions (sdate, time) with the POSIX initial date of each forecast time of each starting date.
'end', an array of dimensions (sdate, time) with the POSIX final date of each forecast time of each starting date.
'InitializationDates', a vector of starting dates as specified in 'sdates', in POSIX format.
'when', a time stamp of the date the Load()
call to obtain
the data was issued.
'source_files', a vector of character strings with complete paths
to all the found files involved in the Load()
call.
'not_found_files', a vector of character strings with complete
paths to not found files involved in the Load()
call.
# Let's assume we want to perform verification with data of a variable
# called 'tos' from a model called 'model' and observed data coming from
# an observational dataset called 'observation'.
#
# The model was run in the context of an experiment named 'experiment'.
# It simulated from 1st November in 1985, 1990, 1995, 2000 and 2005 for a
# period of 5 years time from each starting date. 5 different sets of
# initial conditions were used so an ensemble of 5 members was generated
# for each starting date.
# The model generated values for the variables 'tos' and 'tas' in a
# 3-hourly frequency but, after some initial post-processing, it was
# averaged over every month.
# The resulting monthly average series were stored in a file for each
# starting date for each variable with the data of the 5 ensemble members.
# The resulting directory tree was the following:
# model
# |--> experiment
# |--> monthly_mean
# |--> tos_3hourly
# | |--> tos_19851101.nc
# | |--> tos_19901101.nc
# | .
# | .
# | |--> tos_20051101.nc
# |--> tas_3hourly
# |--> tas_19851101.nc
# |--> tas_19901101.nc
# .
# .
# |--> tas_20051101.nc
#
# The observation recorded values of 'tos' and 'tas' at each day of the
# month over that period but was also averaged over months and stored in
# a file per month. The directory tree was the following:
# observation
# |--> monthly_mean
# |--> tos
# | |--> tos_198511.nc
# | |--> tos_198512.nc
# | |--> tos_198601.nc
# | .
# | .
# | |--> tos_201010.nc
# |--> tas
# |--> tas_198511.nc
# |--> tas_198512.nc
# |--> tas_198601.nc
# .
# .
# |--> tas_201010.nc
#
# The model data is stored in a file-per-startdate fashion and the
# observational data is stored in a file-per-month, and both are stored in
# a monthly frequency. The file format is NetCDF.
# Hence all the data is supported by Load() (see details and other supported
# conventions in ?Load) but first we need to configure it properly.
#
# These data files are included in the package (in the 'sample_data' folder),
# only for the variable 'tos'. They have been interpolated to a very low
# resolution grid so as to make it on CRAN.
# The original grid names (following CDO conventions) for experimental and
# observational data were 't106grid' and 'r180x89' respectively. The final
# resolutions are 'r20x10' and 'r16x8' respectively.
# The experimental data comes from the decadal climate prediction experiment
# run at IC3 in the context of the CMIP5 project. Its name within IC3 local
# database is 'i00k'.
# The observational dataset used for verification is the 'ERSST'
# observational dataset.
#
# The next two examples are equivalent and show how to load the variable
# 'tos' from these sample datasets, the first providing lists of lists to
# the parameters 'exp' and 'obs' (see documentation on these parameters) and
# the second providing vectors of character strings, hence using a
# configuration file.
#
# The code is not run because it dispatches system calls to 'cdo' which is
# not allowed in the examples as per CRAN policies. You can run it on your
# system though.
# Instead, the code in 'dontshow' is run, which loads the equivalent
# already processed data in R.
#
# Example 1: Providing lists of lists to 'exp' and 'obs':
#
## Not run:
data_path <- system.file('sample_data', package = 's2dv')
exp <- list(
name = 'experiment',
path = file.path(data_path, 'model/$EXP_NAME$/monthly_mean',
'$VAR_NAME$_3hourly/$VAR_NAME$_$START_DATES$.nc')
)
obs <- list(
name = 'observation',
path = file.path(data_path, 'observation/$OBS_NAME$/monthly_mean',
'$VAR_NAME$/$VAR_NAME$_$YEAR$$MONTH$.nc')
)
# Now we are ready to use Load().
startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101')
sampleData <- Load('tos', list(exp), list(obs), startDates,
output = 'areave', latmin = 27, latmax = 48,
lonmin = -12, lonmax = 40)
## End(Not run)
#
# Example 2: Providing vectors of character strings to 'exp' and 'obs'
# and using a configuration file.
#
# The configuration file 'sample.conf' that we will create in the example
# has the proper entries to load these (see ?LoadConfigFile for details on
# writing a configuration file).
#
## Not run:
data_path <- system.file('sample_data', package = 's2dv')
expA <- list(name = 'experiment', path = file.path(data_path,
'model/$EXP_NAME$/$STORE_FREQ$_mean/$VAR_NAME$_3hourly',
'$VAR_NAME$_$START_DATE$.nc'))
obsX <- list(name = 'observation', path = file.path(data_path,
'$OBS_NAME$/$STORE_FREQ$_mean/$VAR_NAME$',
'$VAR_NAME$_$YEAR$$MONTH$.nc'))
# Now we are ready to use Load().
startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101')
sampleData <- Load('tos', list(expA), list(obsX), startDates,
output = 'areave', latmin = 27, latmax = 48,
lonmin = -12, lonmax = 40)
#
# Example 3: providing character strings in 'exp' and 'obs', and providing
# a configuration file.
# The configuration file 'sample.conf' that we will create in the example
# has the proper entries to load these (see ?LoadConfigFile for details on
# writing a configuration file).
#
configfile <- paste0(tempdir(), '/sample.conf')
ConfigFileCreate(configfile, confirm = FALSE)
c <- ConfigFileOpen(configfile)
c <- ConfigEditDefinition(c, 'DEFAULT_VAR_MIN', '-1e19', confirm = FALSE)
c <- ConfigEditDefinition(c, 'DEFAULT_VAR_MAX', '1e19', confirm = FALSE)
data_path <- system.file('sample_data', package = 's2dv')
exp_data_path <- paste0(data_path, '/model/$EXP_NAME$/')
obs_data_path <- paste0(data_path, '/$OBS_NAME$/')
c <- ConfigAddEntry(c, 'experiments', dataset_name = 'experiment',
var_name = 'tos', main_path = exp_data_path,
file_path = '$STORE_FREQ$_mean/$VAR_NAME$_3hourly/$VAR_NAME$_$START_DATE$.nc')
c <- ConfigAddEntry(c, 'observations', dataset_name = 'observation',
var_name = 'tos', main_path = obs_data_path,
file_path = '$STORE_FREQ$_mean/$VAR_NAME$/$VAR_NAME$_$YEAR$$MONTH$.nc')
ConfigFileSave(c, configfile, confirm = FALSE)
# Now we are ready to use Load().
startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101')
sampleData <- Load('tos', c('experiment'), c('observation'), startDates,
output = 'areave', latmin = 27, latmax = 48,
lonmin = -12, lonmax = 40, configfile = configfile)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.