extract: Extract predictions from FVCOM arrays

View source: R/extract.R

extractR Documentation

Extract predictions from FVCOM arrays

Description

This function automates the extraction of FVCOM predictions for a variable on specified dates/times/layers/mesh cells from FVCOM arrays. The function implements important checks, accounts for any corrupt files and can load FVCOM arrays in parallel. FVCOM predictions are returned as a dataframe because these form the backbone of data analysis in R and are easily converted to other objects (e.g. matrices, lists) where necessary.

Usage

extract(
  dat,
  match_hour = data.frame(hour = 0:23, index = 1:24),
  match_layer = NULL,
  match_mesh = NULL,
  corrupt = NULL,
  read_fvcom = function(con) R.matlab::readMat(con)$data,
  dir2load,
  extension = ".mat",
  cl = NULL,
  pass2varlist = NULL,
  verbose = TRUE
)

Arguments

dat

A dataframe which defines the FVCOM array date names, hours, layers (if applicable) and mesh cell IDs for which FVCOM predictions should be extracted from FVCOM arrays. Columns should be named 'date_name', 'hour', 'layer' and 'mesh_ID' respectively.

match_hour

A dataframe with two integer columns named 'hour' and 'index' which defines the index in FVCOM arrays (i.e. the row) which corresponds to each hour. The default dataframe is usually appropriate, if FVCOM arrays have not been subsetted. However, if FVCOM arrays have been subsetted (e.g. by selecting rows corresponding to hours 12 and 13), then rows 1 and 2 in FVCOM arrays now represent hours 12 and 13, not hours 0 and 1. match_hour provides the link which ensures that data for specified hours (e.g. hours 12 and 13) are correctly extracted from an FVCOM array (see Examples). All FVCOM arrays in dir2load are assumed to have the same structure.

match_layer

A dataframe with two integer columns named 'layer' and 'index' which defines the index in FVCOM arrays (i.e. the column) which corresponds to each layer. This is only necessary if you are working with 3-dimensional fields and if you are working with subsetted of FVCOM arrays: if FVCOM arrays have been subsetted (e.g. by selecting layers corresponding to columns 2 and 3), then columns 1 and 2 in FVCOM arrays now represent layers 2 and 3, not layers 1 and 2. match_layer provides this link which ensures that FVCOM predictions are extracted correctly (see Examples). All FVCOM arrays in dir2load are assumed to have the same structure.

match_mesh

(optional) A dataframe with two columns named 'mesh' and 'index' which defines the index in FVCOM arrays (columns or sheets for 2-dimensional and 3-dimensional arrays respectively) which corresponds to each mesh cell. This only needs to be provided if you are working with a subset of FVCOM arrays: in this situation, mesh IDs 5, 6, 7, for example, may not correspond to index 5, 6, 7 in FVCOM arrays. match_mesh provides the link which ensures that FVCOM predictions are extracted correctly (see Examples). All FVCOM arrays in dir2load are assumed to have the same structure.

corrupt

A vector of numbers, representing FVCOM date names, which define corrupt files. These will not be loaded.

read_fvcom

A function which is used to load files (and, if necessary, extract the FVCOM array from the loaded object). The function should take a single single argument, the file connection, as input, and load the file. The default is function(con) = R.matlab::readMat(con)$data but other functions may be required depending on the format of FVCOM files (e.g. readRDS).

dir2load

A string which defines the directory from which to load FVCOM arrays. In this directory, FVCOM file names are assumed to follow the standard naming convention (i.e., yymmdd; see date_name). All files with the pattern *extension (see extension) are assumed to be FVCOM arrays.

extension

A string which defines the extension of the FVCOM arrays. The default is ".mat".

cl

(optional) A cluster objected created by the parallel package. If supplied, the algorithm is implemented in parallel. Note that the connection with the cluster is stopped within the function.

pass2varlist

A list containing the names of exported objects. This may be required if cl is supplied. This is passed to the varlist argument of clusterExport. Exported objects must be located in the global environment.

verbose

A logical input which defines whether or not to display messages to the console detailing function progress.

Details

The function is designed to extract FVCOM predictions for only one variable at a time. This is because FVCOM outputs different variables should be located in different folders and may differ in dimension (2-dimensional versus 3-dimensional variables) or file type (usually files are .mat but fvcom.tbx can compute and save outputs as .rds files). To implement the function for multiple variables, implement the function sequentially or in parallel for each variable. Non integer hours or layers are not currently implemented. If you require predictions for non integer hours or layers, include each the lower and upper hour or layer in dat and then interpolate predictions after these are provided. It is important to emphasise that FVCOM outputs are saved in multiple files (one for each day) by necessity: they contain a lot of data. Therefore, there are memory limitations which constrain the amount of data that can be extracted and returned by extract. If you need to extract large numbers of predictions to compute summary statistics or plots, use explore_field_2d which implements an iterative approach which avoids this issue by loading files iteratively and only saving outputs computed from each file, rather than all the predictions for each file. In other cases, extract can be implemented iteratively, with the outputs saved after every iteration in separate files.

Value

The function returns a dataframe which includes FVCOM predictions for specified dates/hours/layers and mesh cells.

Author(s)

Edward Lavender

Examples

#### Example (1) Implement extract() for a 2-dimensional variable
# Define dataframe specifying the dates/hours and locations for which we want predictions:
timestamp <- as.POSIXct(c("2016-03-01", "2016-03-02"), tz = "UTC")
dat <- data.frame(timestamp = timestamp)
dat$date_name <- date_name(dat$timestamp)
dat$hour <- lubridate::hour(dat$timestamp)
dat$mesh_ID <- dat_nodexy[1:nrow(dat), "node_id"]
# Define match dataframes to provide the link between dat and WeStCOMS arrays:
match_hour <- data.frame(hour = 0:1, index = 1:2)
match_mesh <- data.frame(mesh = dat_nodexy$node_id, index = 1:length(dat_nodexy$node_id))
# Define path
path <- system.file("WeStCOMS_files/tidal_elevation",
                    package = "fvcom.tbx", mustWork = TRUE)
path <- paste0(path, "/")
# Implement extract() for a 2-dimensional variable
ext <- extract(dat = dat,
               match_hour = match_hour,
               match_mesh = match_mesh,
               dir2load = path,
               extension = ".mat",
               verbose = TRUE)

#### Example (2) Implement extract for a 3-dimensional variable
# Define dat$layer
dat$layer <- 1
# Define match_layer if necessary
# Define path
path <- system.file("WeStCOMS_files/temp",
                    package = "fvcom.tbx", mustWork = TRUE)
path <- paste0(path, "/")
# Implement extract()
ext <- extract(dat = dat,
               match_hour = match_hour,
               match_mesh = match_mesh,
               dir2load = path,
               extension = ".mat",
               verbose = TRUE)

#### Example (3) Implement extract() in parallel
ext <- extract(dat = dat,
               match_hour = match_hour,
               match_mesh = match_mesh,
               dir2load = path,
               extension = ".mat",
               verbose = TRUE,
               cl = parallel::makeCluster(2L))


edwardlavender/fvcom.tbx documentation built on Nov. 26, 2022, 10:28 p.m.