Description Usage Arguments Value Expected File Structure Subsetting by Latitude / Longitude Bounds
View source: R/get_process_chunks.R
Get information about the climate data files in a folder, and split them up by region-by-latitude chunk to allow for processing subsections of the (very large) files at a time.
1 2 | get.process.chunks(defaults, save.output = FALSE, search.dir = character(),
show.messages = TRUE)
|
defaults |
the output from |
save.output |
whether to save the file information as a file in the
search directory (as " |
search.dir |
by default, |
show.messages |
whether to show some useful descriptions of the search
procedure (by default |
a list giving, for each region-by-latitude chunk the subset suffix
(reg
) if available, the filename (fn
), the latitude and
longitude coordinates of the pixels in the subset (lat
, one element;
and lon
), the global pixel id (the variable global_loc
if it
exists in the NetCDF file, otherwise a new one is created, counting
pixels up by files alphabetically), the local id (local_idxs
) linear
index within the file of each pixel), the number of experiment runs in
the file (the variable run
in the NetCDF file; 1 otherwise),
and the within-file indices along each location dimension (either just
location or lon x lat) dim_idxs
.
In general, most common forms of climate file structures are supported,
especially the CMIP5 structure (for best results, filenames should still be
in CMIP5 format with an optional "_[ ]
" suffix for regional subsets,
etc. - see the set.defaults
documentation for more info).
Variables can either be on a lon x lat
grid or stored by linear
location. Files can either contain all runs of a model or can be saved by run.
Files can either contain the whole timeframe of a model run or be split up in
consecutive temporal chunks. Furthermore:
the code searches for NetCDF files using the search
string "[defaults$filevar]_day_.*nc
" (by default; this can be
changed by setting defaults$search.str
). Make sure no other NetCDF
files with that pattern are present in the search directory (by default
defaults$mod.data.dir
).
Currently, the code expects the primary variable to
have either a location dimension (giving the linear index of a location),
or a lon x lat grid. These are all identified by name - the search terms
used can be set in defaults$varnames
- out-of-the-box, the package
for example supports "lat", "latitude", "Latitude", and "latitude_1" as
possible names for the "lat" dimension.
The code expects there to be two location variables,
lat
and lon
(CMIP5 syntax), giving the lat/lon location of
every pixel in the file. The names of those variables can be any of the
alternatives given by defaults$varnames
- e.g. out of the box,
the code also checks for "latitude", "longitude", etc. See set.defaults
for information on adding naming conventions.
If there are multiple runs in the file, there should
be a run
variable/dimension in the file giving the run id as an
integer
If defaults$lat.clip
and/or defaults$lon.clip
exist, only
information and file locations of pixels within those lat/lon bounds are
returned. $lat.clip
and $lon.clip
should be vectors of the
form c(min,max)
, e.g. defaults$lat.clip=c(23,52)
,
defaults$lon.clip=c(-125,-65)
for a box around the continental USA.
Longitude coordinates can be entered either in a [-180 180]
or in
a [0 360]
format, regardless of the loading data's format - they'll
be matched in format before subsetting. The global.loc.idx
(used for
filenames) is unaffected by the subsetting, meaning that the idx are still
counted with regards to the full lat/lon universe. This allows an initial
subset to be expanded without issues with output file structures.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.