validate | R Documentation |
For specified location(s), layer(s) and timestamp(s), this function pairs observations with corresponding FVCOM predictions. Locations and observations can be derived from different datasets. This means that animal movement datasets, in which locations and observations may be derived from different electronic tags, can be used for validation. The function uses nearest neighbour interpolation to extract FVCOM predictions for the nearest hour, at the specified layer and at the nearest mesh cell for which predictions are available relative to user inputs. Paired observations and predictions can be used to examine model skill (i.e., prediction accuracy).
validate( dat_obs1, dat_obs2 = NULL, threshold_match_gap = NULL, mesh, match_hour = data.frame(hour = 0:23, index = 1:24), match_layer = NULL, match_mesh = NULL, extension = ".mat", corrupt = NULL, read_fvcom = function(con) R.matlab::readMat(con)$data, dir2load, cl = NULL, pass2varlist = NULL, verbose = TRUE )
dat_obs1 |
A dataframe which contains the location(s), layer(s) and timestamp(s) at which environmental observations have been made. These should be defined in columns named 'lat' and 'long', 'location', 'layer' and 'timestamp' respectively. A column 'key' may also need to be included (see Details). Location(s) are assumed to be in World Geodetic System format (i.e. WGS 84). Observations can either be located in this dataframe, in a column called 'obs', or in a separate dataframe, ( |
dat_obs2 |
A dataframe which contains the environmental observations which will be used to validate model predictions. This must contain timestamps ('timestamp'), observations ('obs'). A column 'key' may also need to be included (see Details). Note that observations can be provided in |
threshold_match_gap |
A numeric input which defines the duration (s) between the timestamp of a known location and that of a corresponding observation before/after which these timestamps in |
mesh |
A |
match_hour |
A dataframe with two integer columns named 'hour' and 'index' which defines the index in FVCOM arrays (i.e. the row) which corresponds to each hour (see |
match_layer |
A dataframe with two integer columns named 'layer' and 'index' which defines the index in FVCOM arrays (i.e. the column) which corresponds to each layer n (see |
match_mesh |
(optional) A dataframe with two columns named 'mesh' and 'index' which defines the index in FVCOM arrays (columns or sheets for 2d and 3d arrays respectively) which corresponds to each mesh cell (see |
extension |
A string which defines the extension of the FVCOM arrays (see |
corrupt |
A vector of numbers, representing WeStCOMS date names, which define corrupt files (see |
read_fvcom |
A function which is used to load files (see |
dir2load |
A string which defines the directory from which to load FVCOM arrays containing predictions (see |
cl |
(optional) A cluster objected created by the parallel package (see |
pass2varlist |
A list containing the names of exported objects. This may be required if |
verbose |
A logical input which defines whether or not to display messages to the console detailing function progress. |
To use this function, the user must supply a dataframe (dat_obs1
) which contains the locations(s) , layer(s) and time(s) (in a column named 'timestamp'). These columns must be named 'long' and 'lat', 'layer' and 'timestamp' respectively. A column, 'key', may also need to be included (see below). Location(s) are assumed to be in World Geodetic System format (i.e. WGS 84). Observations can either be located in this dataframe, in a column called 'obs', or in a separate dataframe (dat_obs2
) with columns 'timestamp', 'obs' and 'key'. One situation where this latter option is useful is for animal movement data when the location of the animal is known from one tag type (e.g. passive acoustic telemetry) and observations are recorded by another tag type (e.g. an archival tag), possibly at a different resolution. In this scenario, both dataframes should contain a column, 'key', which connects the factor level(s) (e.g. individuals) for which locations observations have been made in the first dataframe with the factor level(s) for which environmental observations (which will be used to validate the model) have been made. In this case, a nearest neighbour matching approach is used to add observations into the first dataframe (dat_obs1
) from the second dataframe (dat_obs2
) by selecting those that occurred closest in time to the timestamps stipulated in the first dataframe (this is why is is important to have a 'key' to distinguish among factor levels). The function uses a mesh supplied by the user to determine the FVCOM mesh nodes/elements within which each location lies (i.e. nearest neighbour interpolation). With this information, the function loads in each FVCOM file from a user-defined directory, extracts the relevant model predictions from this file, and then returns the original dataframe with predictions added. FVCOM files can be loaded in parallel via cl
and pass2varlist
arguments.
The function returns a dataframe containing the columns in dat_obs1
with some additions. If dat_obs2
is used to add observations to this dataframe, the dataframe returned also contains a column with environmental observations, 'obs', the timestamps at which these were made ('timestamp_obs'), extracted from dat_obs2
, and the difference (seconds) between the timestamps of known locations and the timestamps at which observations were made ('difftime'). 'mesh_ID' defines the node (or element) within which each observation occurred and 'index_row', 'index_mesh' (and, if applicable, 'index_layer') provide a reference to the cells in the FVCOM arrays from which model predictions were extracted. 'wc' contains the predicted conditions from WeStCOMS and 'diff' contains the difference between observed and predicted conditions (i.e. 'obs' - 'wc').
Edward Lavender
########################################### #### Example (1): Temperature observations at specified locations and times #### Hypothetical scenario # Imagine we have recorded surface temperature observations # ... with standard probes in the following locations and at the specified times. #### Define dataframe: # First, define imaginary timestamps and locations at which we made observations set.seed(1) timestamp <- as.POSIXct(c("2016-03-01", "2016-03-02"), tz = "UTC") xy_node <- sp::coordinates(dat_mesh_around_nodes) xy_sel <- sample(1:nrow(xy_node), size = length(timestamp)) long <- as.numeric(xy_node[xy_sel, 1]) lat <- as.numeric(xy_node[xy_sel, 2]) dat_obs1 <- data.frame(timestamp = timestamp, long = long, lat = lat, layer = 1) # Second, add imaginary observations to the dataframe: dat_obs1$obs <- stats::runif(nrow(dat_obs1), 7, 10) #### Define match dataframes to provide the link between dat_obs1 and the WeStCOMS arrays: match_hour <- data.frame(hour = 0:1, index = 1:2) match_layer <- data.frame(layer = 1:2, index = 1:2) match_mesh <- data.frame(mesh = dat_nodexy$node_id, index = 1:length(dat_nodexy$node_id)) #### Define mesh CRS: proj <- sp::CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0") raster::crs(dat_mesh_around_nodes) <- proj #### Define path from which to load predictions: path <- system.file("WeStCOMS_files/temp", package = "fvcom.tbx", mustWork = TRUE) path <- paste0(path, "/") #### Implement validation: validation <- validate(dat_obs1 = dat_obs1, dat_obs2 = NULL, mesh = dat_mesh_around_nodes, match_hour = match_hour, match_layer = match_layer, match_mesh = match_mesh, dir2load = path, extension = ".mat", corrupt = NULL, cl = NULL, pass2varlist = NULL, verbose = TRUE ) # Examine output: validation #### Check that each node IDs was correctly identified given supplied coordinates: # Compare selected nodes from which coordinates were inputted # ... with the nodes identified by the function: as.character(dat_mesh_around_nodes$ID)[xy_sel]; validation$mesh_ID # We can see we have identified the correct position in the array corresponding to each node: cbind(validation[, c("index_mesh", "mesh_ID")], match_mesh[validation$index_mesh, ]) ### We can see that the data for each node has been extracted correctly: mapply(list.files(path, full.names = TRUE), split(validation, f = validation$date_name), FUN = function(con, df){ # Load in file sample <- R.matlab::readMat(con) sample <- sample$data # Extract data (include df$layer here since 3d field): sample <- sample[df$index_hour, df$index_layer, df$index_mesh] # Check identical: print(identical(sample, df$wc)) }) ########################################### #### Example (2): A validation dataset from animal movement data #### Hypothetical scenario # Imagine that our validation dataset comes from animal movement data # ... in which locations are known from one dataset and observations have been made # ... and stored in another dataset. #### Define dat_obs1 # Define some timestamps (on 00:00 hours to match sample data in package): timestamp <- as.POSIXct(c("2016-03-01 00:00:32", "2016-03-02 00:00:30"), tz = "UTC") # Define locations long <- as.numeric(xy_node[xy_sel, 1]) lat <- as.numeric(xy_node[xy_sel, 2]) # Define dataframe dat_obs1 <- data.frame(timestamp = timestamp, long = long, lat = lat, layer = 1, key = 1) #### Define dat_obs2 # Imagine we have collected observations every 2 minutes timestamp_obs <- seq(as.POSIXct("2016-03-01", tz = "UTC"), as.POSIXct("2016-03-03", tz = "UTC"), by = "2 mins") dat_obs2 <- data.frame(timestamp = timestamp_obs, key = 1, obs = stats::runif(length(timestamp_obs), 7, 10)) #### Implement validation: validation <- validate(dat_obs1 = dat_obs1, dat_obs2 = dat_obs2, mesh = dat_mesh_around_nodes, match_hour = match_hour, match_layer = NULL, match_mesh = match_mesh, dir2load = path, extension = ".mat", corrupt = NULL, cl = NULL, pass2varlist = NULL, verbose = TRUE ) #### Examine output: validation #### We can see that the algorithm has identified observations # ... at the nearest point in time to known locations validation[, c("timestamp", "timestamp_obs", "obs")] cbind(validation$obs, dat_obs2[dat_obs2$timestamp %in% validation$timestamp_obs, "obs"]) # The algorithm then uses this information to extract outputs from WeStCOMS.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.