summary_database: Summary CMIP6 model output file status

Description Usage Arguments Details Value Examples

Description

summary_database() scan the directory specified and returns a data.table() containing summary information about all the CMIP6 files available against the output file index loaded using load_cmip6_index().

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
summary_database(
  dir,
  by = c("activity", "experiment", "variant", "frequency", "variable", "source",
    "resolution"),
  mult = c("skip", "latest"),
  append = FALSE,
  recursive = FALSE,
  update = FALSE,
  warning = TRUE
)

Arguments

dir

A single string indcating the directory where CMIP6 model output NetCDF files are stored.

by

The grouping column to summary the database status. Should be a subeset of:

  • "experiment": root experiment identifiers

  • "source": model identifiers

  • "variable": variable identifiers

  • "activity": activity identifiers

  • "frequency": sampling frequency

  • "variant": variant label

  • "resolution": approximate horizontal resolution

mult

Actions when multiple files match a same case in the CMIP6 index. If "latest", the file with latest modification time will be used. If "skip", all matched files will be skip and this case will be kept as unmatched. Default: "skip".

append

If TRUE, status of CMIP6 files will only be updated if they are not found in previous summary. This is useful if CMIP6 files are stored in different directories. Default: FALSE.

recursive

If TRUE, scan recursively into directories. Default: FALSE.

update

If TRUE, the output file index will be updated based on the matched NetCDF files in specified directory. If FALSE, only current loaded index will be updated, but the actual index database file saved in get_data_dir() will remain unchanged. Default: FALSE.

warning

If TRUE, warning messages will show when multiple files match a same case. Default: TRUE.

Details

summary_database() uses future.apply underneath. You can use your preferable future backend to speed up data extraction in parallel. By default, summary_database() uses future::sequential backend, which runs things in sequential.

Value

A data.table::data.table() containing corresponding grouping columns plus:

Column Type Description
datetime_start POSIXct Start date and time of simulation
datetime_end POSIXct End date and time of simulation
file_num Integer Total number of file per group
file_size Units (Mbytes) Approximate total size of file
dl_num Integer Total number of file downloaded
dl_percent Units (%) Total percentage of file downloaded
dl_size Units (Mbytes) Total size of file downloaded

Also an attribute not_matched is added to the returned data.table::data.table() which contains meta data for those CMIP6 output files that are not covered by current CMIP6 output file index.

For the meaning of grouping columns, see init_cmip6_index().

Examples

1
2
3
4
5
6
## Not run: 
summary_database()

summary_database(by = "experiment")

## End(Not run)

epwshiftr documentation built on May 26, 2021, 5:08 p.m.