get.process.chunks: Get File Information to Allow Processing of Subsets

Description Usage Arguments Value Expected File Structure Subsetting by Latitude / Longitude Bounds

View source: R/get_process_chunks.R

Description

Get information about the climate data files in a folder, and split them up by region-by-latitude chunk to allow for processing subsections of the (very large) files at a time.

Usage

1
2
get.process.chunks(defaults, save.output = FALSE, search.dir = character(),
  show.messages = TRUE)

Arguments

defaults

the output from set.defaults. The defaults used are filevar and, if search.dir=numeric() (by default), mod.data.dir. This program also supports subsets by latitude or longitude, if defaults$lat.clip / defaults$lon.clip exist (as c(min,max), see below).

save.output

whether to save the file information as a file in the search directory (as "process_inputs.RData"), by default FALSE

search.dir

by default, get.process.chunks searches in the mod.data.dir in defaults. If a different directory should be examined, use this to set the path.

show.messages

whether to show some useful descriptions of the search procedure (by default TRUE)

Value

a list giving, for each region-by-latitude chunk the subset suffix (reg) if available, the filename (fn), the latitude and longitude coordinates of the pixels in the subset (lat, one element; and lon), the global pixel id (the variable global_loc if it exists in the NetCDF file, otherwise a new one is created, counting pixels up by files alphabetically), the local id (local_idxs) linear index within the file of each pixel), the number of experiment runs in the file (the variable run in the NetCDF file; 1 otherwise), and the within-file indices along each location dimension (either just location or lon x lat) dim_idxs.

Expected File Structure

In general, most common forms of climate file structures are supported, especially the CMIP5 structure (for best results, filenames should still be in CMIP5 format with an optional "_[ ]" suffix for regional subsets, etc. - see the set.defaults documentation for more info). Variables can either be on a lon x lat grid or stored by linear location. Files can either contain all runs of a model or can be saved by run. Files can either contain the whole timeframe of a model run or be split up in consecutive temporal chunks. Furthermore:

filename

the code searches for NetCDF files using the search string "[defaults$filevar]_day_.*nc" (by default; this can be changed by setting defaults$search.str). Make sure no other NetCDF files with that pattern are present in the search directory (by default defaults$mod.data.dir).

variable setup

Currently, the code expects the primary variable to have either a location dimension (giving the linear index of a location), or a lon x lat grid. These are all identified by name - the search terms used can be set in defaults$varnames - out-of-the-box, the package for example supports "lat", "latitude", "Latitude", and "latitude_1" as possible names for the "lat" dimension.

locations

The code expects there to be two location variables, lat and lon (CMIP5 syntax), giving the lat/lon location of every pixel in the file. The names of those variables can be any of the alternatives given by defaults$varnames - e.g. out of the box, the code also checks for "latitude", "longitude", etc. See set.defaults for information on adding naming conventions.

multiple runs

If there are multiple runs in the file, there should be a run variable/dimension in the file giving the run id as an integer

Subsetting by Latitude / Longitude Bounds

If defaults$lat.clip and/or defaults$lon.clip exist, only information and file locations of pixels within those lat/lon bounds are returned. $lat.clip and $lon.clip should be vectors of the form c(min,max), e.g. defaults$lat.clip=c(23,52), defaults$lon.clip=c(-125,-65) for a box around the continental USA. Longitude coordinates can be entered either in a [-180 180] or in a [0 360] format, regardless of the loading data's format - they'll be matched in format before subsetting. The global.loc.idx (used for filenames) is unaffected by the subsetting, meaning that the idx are still counted with regards to the full lat/lon universe. This allows an initial subset to be expanded without issues with output file structures.


ks905383/quantproj documentation built on Nov. 1, 2020, 9:12 p.m.