| Load | R Documentation | 
This function loads monthly or daily data from a set of specified 
experimental datasets together with data that date-corresponds from a set 
of specified observational datasets. See parameters 'storefreq', 
'sampleperiod', 'exp' and 'obs'.
A set of starting dates is specified through the parameter 'sdates'. Data of 
each starting date is loaded for each model.
Load() arranges the data in two arrays with a similar format both 
with the following dimensions:
The number of experimental datasets determined by the user through the argument 'exp' (for the experimental data array) or the number of observational datasets available for validation (for the observational array) determined as well by the user through the argument 'obs'.
The greatest number of members across all experiments (in the experimental data array) or across all observational datasets (in the observational data array).
The number of starting dates determined by the user through the 'sdates' argument.
The greatest number of lead-times.
The number of latitudes of the selected zone.
The number of longitudes of the selected zone.
Dimensions 5 and 6 are optional and their presence depends on the type of 
the specified variable (global mean or 2-dimensional) and on the selected 
output type (area averaged time series, latitude averaged time series, 
longitude averaged time series or 2-dimensional time series).
In the case of loading an area average the dimensions of the arrays will be 
only the first 4.
Only a specified variable is loaded from each experiment at each starting 
date. See parameter 'var'.
Afterwards, observational data that matches every starting date and lead-time 
of every experimental dataset is fetched in the file system (so, if two 
predictions at two different start dates overlap, some observational values 
will be loaded and kept in memory more than once).
If no data is found in the file system for an experimental or observational 
array point it is filled with an NA value.
If the specified output is 2-dimensional or latitude- or longitude-averaged 
time series all the data is interpolated into a common grid. If the 
specified output type is area averaged time series the data is averaged on 
the individual grid of each dataset but can also be averaged after 
interpolating into a common grid. See parameters 'grid' and 'method'.
Once the two arrays are filled by calling this function, other functions in 
the s2dverification package that receive as inputs data formatted in this 
data structure can be executed (e.g: Clim() to compute climatologies, 
Ano() to compute anomalies, ...).
Load() has many additional parameters to disable values and trim dimensions 
of selected variable, even masks can be applied to 2-dimensional variables. 
See parameters 'nmember', 'nmemberobs', 'nleadtime', 'leadtimemin', 
'leadtimemax', 'sampleperiod', 'lonmin', 'lonmax', 'latmin', 'latmax', 
'maskmod', 'maskobs', 'varmin', 'varmax'.
The parameters 'exp' and 'obs' can take various forms. The most direct form 
is a list of lists, where each sub-list has the component 'path' associated 
to a character string with a pattern of the path to the files of a dataset 
to be loaded. These patterns can contain wildcards and tags that will be 
replaced automatically by Load() with the specified starting dates, 
member numbers, variable name, etc.
See parameter 'exp' or 'obs' for details.
Only NetCDF files are supported. OPeNDAP URLs to NetCDF files are also 
supported.
Load() can load 2-dimensional or global mean variables in any of the 
following formats:
experiments:
file per ensemble per starting date (YYYY, MM and DD somewhere in the path)
file per member per starting date 
(YYYY, MM, DD and MemberNumber somewhere in the path. Ensemble 
experiments with different numbers of members can be loaded in 
a single Load() call.)
(YYYY, MM and DD specify the starting dates of the predictions)
observations:
file per ensemble per month (YYYY and MM somewhere in the path)
file per member per month (YYYY, MM and MemberNumber somewhere in the path, obs with different numbers of members supported)
file per dataset (No constraints in the path but the time axes in the file have to be properly defined)
(YYYY and MM correspond to the actual month data in the file)
In all the formats the data can be stored in a daily or monthly frequency, 
or a multiple of these (see parameters 'storefreq' and 'sampleperiod').
All the data files must contain the target variable defined over time and 
potentially over members, latitude and longitude dimensions in any order, 
time being the record dimension.
In the case of a two-dimensional variable, the variables longitude and 
latitude must be defined inside the data file too and must have the same 
names as the dimension for longitudes and latitudes respectively.
The names of these dimensions (and longitude and latitude variables) and the 
name for the members dimension are expected to be 'longitude', 'latitude' 
and 'ensemble' respectively. However, these names can be adjusted with the 
parameter 'dimnames' or can be configured in the configuration file (read 
below in parameters 'exp', 'obs' or see ?ConfigFileOpen 
for more information.
All the data files are expected to have numeric values representable with 
32 bits. Be aware when choosing the fill values or infinite values in the 
datasets to load.
The Load() function returns a named list following a structure similar to 
the used in the package 'downscaleR'.
The components are the following:
'mod' is the array that contains the experimental data. It has the attribute 'dimensions' associated to a vector of strings with the labels of each dimension of the array, in order.
'obs' is the array that contains the observational data. It has the attribute 'dimensions' associated to a vector of strings with the labels of each dimension of the array, in order.
'obs' is the array that contains the observational data.
'lat' and 'lon' are the latitudes and longitudes of the grid into 
which the data is interpolated (0 if the loaded variable is a global 
mean or the output is an area average).
Both have the attribute 'cdo_grid_des' associated with a character
string with the name of the common grid of the data, following the CDO 
naming conventions for grids.
The attribute 'projection' is kept for compatibility with 'downscaleR'.
'Variable' has the following components:
'varName', with the short name of the loaded variable as specified in the parameter 'var'.
'level', with information on the pressure level of the variable. Is kept to NULL by now.
And the following attributes:
'is_standard', kept for compatibility with 'downscaleR', tells if a dataset has been homogenized to standards with 'downscaleR' catalogs.
'units', a character string with the units of measure of the variable, as found in the source files.
'longname', a character string with the long name of the variable, as found in the source files.
'daily_agg_cellfun', 'monthly_agg_cellfun', 'verification_time', kept for compatibility with 'downscaleR'.
'Datasets' has the following components:
'exp', a named list where the names are the identifying character strings of each experiment in 'exp', each associated to a list with the following components:
'members', a list with the names of the members of the dataset.
'source', a path or URL to the source of the dataset.
'obs', similar to 'exp' but for observational datasets.
'Dates', with the follwing components:
'start', an array of dimensions (sdate, time) with the POSIX initial date of each forecast time of each starting date.
'end', an array of dimensions (sdate, time) with the POSIX final date of each forecast time of each starting date.
'InitializationDates', a vector of starting dates as specified in 'sdates', in POSIX format.
'when', a time stamp of the date the Load() call to obtain 
the data was issued.
'source_files', a vector of character strings with complete paths 
to all the found files involved in the Load() call.
'not_found_files', a vector of character strings with complete 
paths to not found files involved in the Load() call.
Load(
  var,
  exp = NULL,
  obs = NULL,
  sdates,
  nmember = NULL,
  nmemberobs = NULL,
  nleadtime = NULL,
  leadtimemin = 1,
  leadtimemax = NULL,
  storefreq = "monthly",
  sampleperiod = 1,
  lonmin = 0,
  lonmax = 360,
  latmin = -90,
  latmax = 90,
  output = "areave",
  method = "conservative",
  grid = NULL,
  maskmod = vector("list", 15),
  maskobs = vector("list", 15),
  configfile = NULL,
  varmin = NULL,
  varmax = NULL,
  silent = FALSE,
  nprocs = NULL,
  dimnames = NULL,
  remapcells = 2,
  path_glob_permissive = "partial"
)
| var | Short name of the variable to load. It should coincide with the 
variable name inside the data files. | 
| exp | Parameter to specify which experimental datasets to load data 
from. 
 The tag $START_DATES$ will be replaced with all the starting dates 
specified in 'sdates'. $YEAR$, $MONTH$ and $DAY$ will take a value for each 
iteration over 'sdates', simply these are the same as $START_DATE$ but 
split in parts. 
list(
  list(
    name = 'experimentA',
    path = file.path('/path/to/$DATASET_NAME$/$STORE_FREQ$',
                     '$VAR_NAME$$SUFFIX$',
                     '$VAR_NAME$_$START_DATE$.nc'),
    nc_var_name = '$VAR_NAME$',
    suffix = '_3hourly',
    var_min = '-1e19',
    var_max = '1e19'
  )
)
This will make  | 
| obs | Argument with the same format as parameter 'exp'. See details on 
parameter 'exp'. | 
| sdates | Vector of starting dates of the experimental runs to be loaded 
following the pattern 'YYYYMMDD'. | 
| nmember | Vector with the numbers of members to load from the specified 
experimental datasets in 'exp'. | 
| nmemberobs | Vector with the numbers of members to load from the 
specified observational datasets in 'obs'. | 
| nleadtime | Deprecated. See parameter 'leadtimemax'. | 
| leadtimemin | Only lead-times higher or equal to 'leadtimemin' are loaded. Takes by default value 1. | 
| leadtimemax | Only lead-times lower or equal to 'leadtimemax' are loaded. 
Takes by default the number of lead-times of the first experimental 
dataset in 'exp'. | 
| storefreq | Frequency at which the data to be loaded is stored in the 
file system. Can take values 'monthly' or 'daily'. | 
| sampleperiod | To load only a subset between 'leadtimemin' and 
'leadtimemax' with the period of subsampling 'sampleperiod'. | 
| lonmin | If a 2-dimensional variable is loaded, values at longitudes 
lower than 'lonmin' aren't loaded. | 
| lonmax | If a 2-dimensional variable is loaded, values at longitudes 
higher than 'lonmax' aren't loaded. | 
| latmin | If a 2-dimensional variable is loaded, values at latitudes 
lower than 'latmin' aren't loaded. | 
| latmax | If a 2-dimensional variable is loaded, values at latitudes 
higher than 'latmax' aren't loaded. | 
| output | This parameter determines the format in which the data is 
arranged in the output arrays. 
 Takes by default the value 'areave'. If the variable specified in 'var' is 
a global mean, this parameter is forced to 'areave'. | 
| method | This parameter determines the interpolation method to be used 
when regridding data (see 'output'). Can take values 'bilinear', 'bicubic', 
'conservative', 'distance-weighted'. | 
| grid | A common grid can be specified through the parameter 'grid' when 
loading 2-dimensional data. Data is then interpolated onto this grid 
whichever 'output' type is specified. If the selected output type is 
'areave' and a 'grid' is specified, the area averages are calculated after 
interpolating to the specified grid. | 
| maskmod | List of masks to be applied to the data of each experimental 
dataset respectively, if a 2-dimensional variable is specified in 'var'. | 
| maskobs | See help on parameter 'maskmod'. | 
| configfile | Path to the s2dverification configuration file from which 
to retrieve information on location in file system (and other) of datasets. | 
| varmin | Loaded experimental and observational data values smaller 
than 'varmin' will be disabled (replaced by NA values). | 
| varmax | Loaded experimental and observational data values greater 
than 'varmax' will be disabled (replaced by NA values). | 
| silent | Parameter to show (FALSE) or hide (TRUE) information messages. | 
| nprocs | Number of parallel processes created to perform the fetch 
and computation of data. | 
| dimnames | Named list where the name of each element is a generic 
name of the expected dimensions inside the NetCDF files. These generic 
names are 'lon', 'lat' and 'member'. 'time' is not needed because it's 
detected automatically by discard. | 
| remapcells | When loading a 2-dimensional variable, spatial subsets can 
be requested via  | 
| path_glob_permissive | In some cases, when specifying a path pattern 
(either in the parameters 'exp'/'obs' or in a configuration file) one can 
specify path patterns that contain shell globbing expressions. Too much 
freedom in putting globbing expressions in the path patterns can be 
dangerous and make  | 
The two output matrices have between 2 and 6 dimensions:
Number of experimental/observational datasets.
Number of members.
Number of startdates.
Number of leadtimes.
Number of latitudes (optional).
Number of longitudes (optional).
but the two matrices have the same number of dimensions and only the first two dimensions can have different lengths depending on the input arguments. For a detailed explanation of the process, read the documentation attached to the package or check the comments in the code.
Load() returns a named list following a structure similar to the 
used in the package 'downscaleR'.
The components are the following:
'mod' is the array that contains the experimental data. It has the 
attribute 'dimensions' associated to a vector of strings with the 
labels of each dimension of the array, in order. The order of the 
latitudes is always forced to be from 90 to -90 whereas the order of 
the longitudes is kept as in the original files (if possible). The 
longitude values provided in lon lower than 0 are added 360 
(but still kept in the original order). In some cases, however, if 
multiple data sets are loaded in longitude-latitude mode, the 
longitudes (and also the data arrays in mod and obs) are 
re-ordered afterwards by Load() to range from 0 to 360; a 
warning is given in such cases. The longitude and latitude of the 
center of the grid cell that corresponds to the value [j, i] in 'mod' 
(along the dimensions latitude and longitude, respectively) can be 
found in the outputs lon[i] and lat[j]
'obs' is the array that contains the observational data. The same documentation of parameter 'mod' applies to this parameter.
'lat' and 'lon' are the latitudes and longitudes of the centers of 
the cells of the grid the data is interpolated into (0 if the loaded 
variable is a global mean or the output is an area average).
Both have the attribute 'cdo_grid_des' associated with a character 
string with the name of the common grid of the data, following the CDO 
naming conventions for grids.
'lon' has the attributes 'first_lon' and 'last_lon', with the first 
and last longitude values found in the region defined by 'lonmin' and 
'lonmax'. 'lat' has also the equivalent attributes 'first_lat' and 
'last_lat'.
'lon' has also the attribute 'data_across_gw' which tells whether the 
requested region via 'lonmin', 'lonmax', 'latmin', 'latmax' goes across 
the Greenwich meridian. As explained in the documentation of the 
parameter 'mod', the loaded data array is kept in the same order as in 
the original files when possible: this means that, in some cases, even 
if the data goes across the Greenwich, the data array may not go 
across the Greenwich. The attribute 'array_across_gw' tells whether 
the array actually goes across the Greenwich. E.g: The longitudes in 
the data files are defined to be from 0 to 360. The requested 
longitudes are from -80 to 40. The original order is kept, hence the 
longitudes in the array will be ordered as follows: 
0, ..., 40, 280, ..., 360. In that case, 'data_across_gw' will be TRUE 
and 'array_across_gw' will be FALSE.
The attribute 'projection' is kept for compatibility with 'downscaleR'.
'Variable' has the following components:
'varName', with the short name of the loaded variable as specified in the parameter 'var'.
'level', with information on the pressure level of the variable. Is kept to NULL by now.
And the following attributes:
'is_standard', kept for compatibility with 'downscaleR', tells if a dataset has been homogenized to standards with 'downscaleR' catalogs.
'units', a character string with the units of measure of the variable, as found in the source files.
'longname', a character string with the long name of the variable, as found in the source files.
'daily_agg_cellfun', 'monthly_agg_cellfun', 'verification_time', kept for compatibility with 'downscaleR'.
'Datasets' has the following components:
'exp', a named list where the names are the identifying character strings of each experiment in 'exp', each associated to a list with the following components:
'members', a list with the names of the members of the dataset.
'source', a path or URL to the source of the dataset.
'obs', similar to 'exp' but for observational datasets.
'Dates', with the follwing components:
'start', an array of dimensions (sdate, time) with the POSIX initial date of each forecast time of each starting date.
'end', an array of dimensions (sdate, time) with the POSIX final date of each forecast time of each starting date.
'InitializationDates', a vector of starting dates as specified in 'sdates', in POSIX format.
'when', a time stamp of the date the Load() call to obtain 
the data was issued.
'source_files', a vector of character strings with complete paths 
to all the found files involved in the Load() call.
'not_found_files', a vector of character strings with complete 
paths to not found files involved in the Load() call.
History:
0.1  -  2011-03  (V. Guemas)  -  Original code
1.0  -  2013-09  (N. Manubens)  -  Formatting to CRAN
1.2  -  2015-02  (N. Manubens)  -  Generalisation + parallelisation
1.3  -  2015-07  (N. Manubens)  -  Improvements related to configuration file mechanism
1.4  -  2016-01  (N. Manubens)  -  Added subsetting capabilities
# Let's assume we want to perform verification with data of a variable
# called 'tos' from a model called 'model' and observed data coming from 
# an observational dataset called 'observation'.
#
# The model was run in the context of an experiment named 'experiment'. 
# It simulated from 1st November in 1985, 1990, 1995, 2000 and 2005 for a 
# period of 5 years time from each starting date. 5 different sets of 
# initial conditions were used so an ensemble of 5 members was generated 
# for each starting date.
# The model generated values for the variables 'tos' and 'tas' in a 
# 3-hourly frequency but, after some initial post-processing, it was 
# averaged over every month.
# The resulting monthly average series were stored in a file for each 
# starting date for each variable with the data of the 5 ensemble members.
# The resulting directory tree was the following:
#   model
#    |--> experiment
#          |--> monthly_mean
#                |--> tos_3hourly
#                |     |--> tos_19851101.nc
#                |     |--> tos_19901101.nc
#                |               .
#                |               .
#                |     |--> tos_20051101.nc 
#                |--> tas_3hourly
#                      |--> tas_19851101.nc
#                      |--> tas_19901101.nc
#                                .
#                                .
#                      |--> tas_20051101.nc
# 
# The observation recorded values of 'tos' and 'tas' at each day of the 
# month over that period but was also averaged over months and stored in 
# a file per month. The directory tree was the following:
#   observation
#    |--> monthly_mean
#          |--> tos
#          |     |--> tos_198511.nc
#          |     |--> tos_198512.nc
#          |     |--> tos_198601.nc
#          |               .
#          |               .
#          |     |--> tos_201010.nc
#          |--> tas
#                |--> tas_198511.nc
#                |--> tas_198512.nc
#                |--> tas_198601.nc
#                          .
#                          .
#                |--> tas_201010.nc
#
# The model data is stored in a file-per-startdate fashion and the
# observational data is stored in a file-per-month, and both are stored in 
# a monthly frequency. The file format is NetCDF.
# Hence all the data is supported by Load() (see details and other supported 
# conventions in ?Load) but first we need to configure it properly.
#
# These data files are included in the package (in the 'sample_data' folder),
# only for the variable 'tos'. They have been interpolated to a very low 
# resolution grid so as to make it on CRAN.
# The original grid names (following CDO conventions) for experimental and 
# observational data were 't106grid' and 'r180x89' respectively. The final
# resolutions are 'r20x10' and 'r16x8' respectively. 
# The experimental data comes from the decadal climate prediction experiment 
# run at IC3 in the context of the CMIP5 project. Its name within IC3 local 
# database is 'i00k'. 
# The observational dataset used for verification is the 'ERSST' 
# observational dataset.
#
# The next two examples are equivalent and show how to load the variable 
# 'tos' from these sample datasets, the first providing lists of lists to 
# the parameters 'exp' and 'obs' (see documentation on these parameters) and 
# the second providing vectors of character strings, hence using a 
# configuration file.
#
# The code is not run because it dispatches system calls to 'cdo' which is 
# not allowed in the examples as per CRAN policies. You can run it on your 
# system though. 
# Instead, the code in 'dontshow' is run, which loads the equivalent
# already processed data in R.
#
# Example 1: Providing lists of lists to 'exp' and 'obs':
#
 ## Not run: 
data_path <- system.file('sample_data', package = 's2dverification')
exp <- list(
        name = 'experiment',
        path = file.path(data_path, 'model/$EXP_NAME$/monthly_mean',
                         '$VAR_NAME$_3hourly/$VAR_NAME$_$START_DATES$.nc')
      )
obs <- list(
        name = 'observation',
        path = file.path(data_path, 'observation/$OBS_NAME$/monthly_mean',
                         '$VAR_NAME$/$VAR_NAME$_$YEAR$$MONTH$.nc')
      )
# Now we are ready to use Load().
startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101')
sampleData <- Load('tos', list(exp), list(obs), startDates,
                  output = 'areave', latmin = 27, latmax = 48, 
                  lonmin = -12, lonmax = 40)
 
## End(Not run)
#
# Example 2: Providing vectors of character strings to 'exp' and 'obs'
#            and using a configuration file.
#
# The configuration file 'sample.conf' that we will create in the example 
# has the proper entries to load these (see ?LoadConfigFile for details on 
# writing a configuration file). 
#
 ## Not run: 
data_path <- system.file('sample_data', package = 's2dverification')
expA <- list(name = 'experiment', path = file.path(data_path, 
            'model/$EXP_NAME$/$STORE_FREQ$_mean/$VAR_NAME$_3hourly',
            '$VAR_NAME$_$START_DATE$.nc'))
obsX <- list(name = 'observation', path = file.path(data_path,
            '$OBS_NAME$/$STORE_FREQ$_mean/$VAR_NAME$',
            '$VAR_NAME$_$YEAR$$MONTH$.nc'))
# Now we are ready to use Load().
startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101')
sampleData <- Load('tos', list(expA), list(obsX), startDates,
                  output = 'areave', latmin = 27, latmax = 48, 
                  lonmin = -12, lonmax = 40)
#
# Example 2: providing character strings in 'exp' and 'obs', and providing
# a configuration file.
# The configuration file 'sample.conf' that we will create in the example 
# has the proper entries to load these (see ?LoadConfigFile for details on 
# writing a configuration file). 
#
configfile <- paste0(tempdir(), '/sample.conf')
ConfigFileCreate(configfile, confirm = FALSE)
c <- ConfigFileOpen(configfile)
c <- ConfigEditDefinition(c, 'DEFAULT_VAR_MIN', '-1e19', confirm = FALSE)
c <- ConfigEditDefinition(c, 'DEFAULT_VAR_MAX', '1e19', confirm = FALSE)
data_path <- system.file('sample_data', package = 's2dverification')
exp_data_path <- paste0(data_path, '/model/$EXP_NAME$/')
obs_data_path <- paste0(data_path, '/$OBS_NAME$/')
c <- ConfigAddEntry(c, 'experiments', dataset_name = 'experiment', 
    var_name = 'tos', main_path = exp_data_path,
    file_path = '$STORE_FREQ$_mean/$VAR_NAME$_3hourly/$VAR_NAME$_$START_DATE$.nc')
c <- ConfigAddEntry(c, 'observations', dataset_name = 'observation', 
    var_name = 'tos', main_path = obs_data_path,
    file_path = '$STORE_FREQ$_mean/$VAR_NAME$/$VAR_NAME$_$YEAR$$MONTH$.nc')
ConfigFileSave(c, configfile, confirm = FALSE)
# Now we are ready to use Load().
startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101')
sampleData <- Load('tos', c('experiment'), c('observation'), startDates, 
                  output = 'areave', latmin = 27, latmax = 48, 
                  lonmin = -12, lonmax = 40, configfile = configfile)
 
## End(Not run)
  
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.