LoadDataFile: Load Data From File Into Environment

.LoadDataFileR Documentation

Load Data From File Into Environment

Description

This function receives a 'work piece', a named list which contains information on a data file to be loaded.
It can be run in 'explore' mode or in 'load' mode.
When running in 'explore' mode, the metadata of the file is read and the sizes of the dimensions in the file are returned in a named list:

  • 'member': Number of members

  • 'time': Number of lead-times

  • 'lon': Longitudes in the file

  • 'lat': Latitudes in the file

When running in 'load' mode it loads and performs any requested computations in additional parameters in the 'work piece', such as interpolating, slicing, ..., and finally stores it in a shared memory matrix pointed by the parameter 'out_pointer' in the 'work piece'.

Usage

.LoadDataFile(work_piece, explore_dims = FALSE, silent = FALSE)

Arguments

work_piece

Named list with information on the file to load or explore and additional parameters.
The needed variables in the work piece are:

  • In explore mode:

    • dataset_type: 'exp'/'obs'

    • filename: full path to the data file

    • dimnames: named list with names 'lon', 'lat' and 'member' and with values the associated actual dimension names inside the NetCDF files.

    • namevar: name of the variable in the nc file

    • grid: common grid into which interpolate or NULL if no interpolation

    • remap: interpolation method ('remapbil'/'remapdis'/'remapcon'/'remapbic')

    • lon_limits: c(lon_min, lon_max)

    • lat_limits: c(lat_min, lat_max)

    • is_file_per_member: TRUE/FALSE

    • is_file_per_dataset: TRUE/FALSE

    • is_single_dataset: TRUE/FALSE

  • In load mode:

    • dataset_type: 'exp'/'obs'

    • filename: full path to the data file

    • dimnames: named list with names 'lon', 'lat' and 'member' and with values the associated actual dimension names inside the NetCDF files.

    • namevar: name of the variable in the nc file

    • is_2d_var: TRUE/FALSE

    • grid: common grid into which interpolate or NULL if no interpolation

    • remap: interpolation method ('remapbil'/'remapdis'/'remapcon'/'remapbic')

    • lon_limits: c(lon_min, lon_max)

    • lat_limits: c(lat_min, lat_max)

    • is_file_per_dataset: TRUE/FALSE

    • startdates: in the case that filename points to a whole dataset file, the list of starting dates to load must be specified. c('sdate1', 'sdate2', ...)

    • out_pointer: big.matrix descriptor pointing to the array (transformed into a matrix) where to keep the data

    • dims: named list with dimension sizes of the original array into which the data is kept. Names must be c(['lon', ]['lat', ]'time', 'member', 'sdates', 'dataset').

    • indices: vector of initial positions corresponding to each dimension in 'dims' where to store data in the original array. First two indices ('lon', 'lat') can be missing.

    • nmember: number of members expected to be loaded

    • mask: complete (untrimmed + interpolated if needed) mask to activate/deactivate data points, with dimensions c('lon', 'lat'), see ?Load for more details.

    • leadtimes: vector of time indices to be loaded from the file

    • var_limits: c(var_min, var_max)

    • is_single_dataset: TRUE/FALSE (whether the user only asked for data of one single dataset. If so, then data won't be re-interpolated when the first longitude of its grid is != 0. Otherwise it must be re-interpolated to ensure all data will be properly aligned over longitudes.

explore_dims

Run in dimension explore mode (TRUE) or in load and calculation mode (FALSE).
Takes by default the value FALSE (calculation mode).

silent

Parameter to allow (FALSE) or deactivate (TRUE) printing of explanatory messages.
When deactivated any warning messages will still be displayed.
Takes by default the value FALSE (verbose mode).

remapcells

Width in number of cells of the surrounding area of the requested subset to be taken into account for the interpolation. See parameter remapcells of Load().

Value

When called in 'explore' mode, a named list is returned with:

  • dims List with the found lengths for members, leadtimes and the latitudes and longitudes already trimmed and reordered if needed. The names are 'member', 'time', 'lon', 'lat'. If it is a file from a file-per-member dataset, the number of files that match the filename replacing the $MEMBER_NUMBER$ part by an asterisk is returned (which is the supposed number of members). There are known issues with this method of detection. See documentation on parameter 'nmember' and 'nmemberobs' in Load() function. When the specified file path is a URL, the returned number of members is NULL.

  • is_2d_var Boolean indicating whether the found variable is 2-dimensional (TRUE) or a global mean (FALSE).

  • grid Character string with the name of the grid of the file, following the cdo grid naming conventions.

  • var_long_name A character string with the variable long name. If not available, the short name is returned.

  • units A character string with the units of the variable.

When called in 'calculation' mode, the found file path or URL is returned if the file was found and NULL is returned otherwise.

Author(s)

History:
0.1 - 2015-01 (N. Manubens) - First version

Examples

  ## Not run: 
data <- s2dverification:::.LoadDataFile(list(dataset_type = 'exp', 
        filename = system.file('sample_data/model/experiment/monthly_mean', 
                               'tos_3hourly/tos_19901101.nc', 
                               package = 's2dverification'),
        namevar = 'tos', lon_limits = c(-12, 40), 
        lat_limits = c(27, 48), is_file_per_member = TRUE, 
        dimnames = list(lon = 'longitude', lat = 'latitude', 
        member = 'ensemble')), explore_dims = TRUE, silent = FALSE)
  
## End(Not run)

s2dverification documentation built on April 20, 2022, 9:06 a.m.