knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(epwshiftr)

# show verbose messages
options(epwshiftr.verbose = TRUE)

Controlled Vocabularies (CVs) and Data Request

All CMIP6 outputs are written in NetCDF files in conformance with the CMIP standards. The Controlled Vocabularies (CVs) and Data Request play a key role in ensuring uniformity in the description of data sets across all models.

The CMIP6 CVs gives a well-defined set of global attributes that are recorded in each CMIP6 model output, providing information necessary for interpreting the data. The data of CVs for use in CMIP6 is stored as JSON files in a GitHub Repo.

The CMIP6 Data Request defines all the quantities from CMIP6 simulations that should be archived. This includes both quantities of general interest needed from most of the CMIP6-endorsed model intercomparison projects (MIPs) and quantities that are more specialized and only of interest to a single endorsed MIP. The Data Request data is stored a Microsoft Excel file (CMIP6_MIP_tables.xlsx) in a Subversion repo.

For more information, please see:

Introduce the Cmip6Dict class

{epwshiftr} provides a Cmip6Dict class to help you fetch, parse and store CMIP6 CVs and Data Request.

Create a Cmip6Dict object

You can create a new Cmip6Dict object using the cmip6_dict() function. It takes no argument and simply returns an empty Cmip6Dict object.

It is an R6 object with reference semantics. All methods in an R6 object can be invoked using the $method_name() style.

dict <- cmip6_dict()
dict

You can use $is_empty() to check if there are any contents in current Cmip6Dict.

dict$is_empty()

Fetch data of CVs and Data Request

After created, you can use $build() to fetch all data of CMIP6 CVs and Data Request.

The GitHub RESTful APIs are used to to fetch the latest tag of CVs and Data Request. You can specify your GitHub token by specifying the token argument in $build(). By default, it will uses GITHUB_PAT or GITHUB_TOKEN environment variable if exists.

Currently, all supported CVs are:

  1. drs: Data Reference Syntax (DRS).
  2. activity_id: Activity identifiers, e.g. HighResMIP, ScenarioMIP.
  3. experiment_id: Root experiment identifiers, e.g. historical, ssp126.
  4. frequency: Sampling frequencies, e.g. mon, day.
  5. grid_label: Grid identifiers, e.g. gn, gr.
  6. institution_id: Institution identifiers, e.g. AWI, CAS.
  7. nominal_resolution: Approximate horizontal resolutions, e.g. 50 km, 100 km.
  8. realm: Realms where variables are defined, e.g. atmos, ocean.
  9. required_global_attributes: Names of required global attributes.
  10. source_id: Model identifiers, e.g. GFDL-CM2-1, ACCESS-CM2.
  11. source_type: Model configurations, e.g. AGCM, OGCM.
  12. sub_experiment_id: Description of sub-experiment, e.g. none, s1960.
  13. table_id: Table identifiers, e.g. Amon, Oday.
dict$build()
dict$is_empty()

You can get the version of the CV collection and Data Request using $version() method. It returns a list of two numeric_versions giving you the current version of fetched CV collection (cvs) and Data Request (dreq):

dict$version()

The last built time can be retrieved using the $built_time() method:

last_built <- dict$built_time()
last_built

The last modified time of the CV collection and each CV can be retrieved using the $timestamp() method:

dict$timestamp()

Extract CV and Data Request data

The CV and Data Request data can be extracted using the $get(type) method, where type is the data type of interest. For Data Request, you can use "dreq" as the type name.

The returned data type are:

  1. For type "drs", "activity_id","frequency","grid_label","institution_id","source_type"and"sub_experiment_id"`, a [list].

  2. For type "experiment_id", "source_id" and "dreq", a [data.table].

  3. For "nominal_resolution", "required_global_attributes" and "table_id", a [character] vector.

# Data Request
dict$get("dreq")

# DRS
dict$get("drs")

# activity_id
dict$get("activity_id")

# experiment_id
dict$get("experiment_id")

# frequency
dict$get("frequency")

# grid_label
dict$get("grid_label")

# institution_id
dict$get("institution_id")

# nominal_resolution
dict$get("nominal_resolution")

# realm
dict$get("realm")

# required_global_attributes
dict$get("required_global_attributes")

# source_id
dict$get("source_id")

# source_type
dict$get("source_type")

# sub_experiment_id
dict$get("sub_experiment_id")

# table_id
dict$get("table_id")

Store and reuse the Cmip6Dict object

$save() method stores all the core data of current Cmip6Dict object into an RDS file named CMIP6DICT in the specified folder. It returns the full path of the saved RDS file.

dict$save(tempdir())

This file can be reloaded via $load() method to restore the last state of current Cmip6Dict object. It reads the RDS file named CMIP6DICT and loads the CVs and Data Request data.

new_dict <- cmip6_dict()
new_dict$load(tempdir())


hongyuanjia/epwshiftr documentation built on March 14, 2024, 9:17 a.m.