In dataobservatory-eu/dataobservatory: Tidy and Documented Datasets

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

dataobservatory

The goal of dataobservatory is to facilitate the automated documentation, and the automated recording of descriptive and administrative (statistical processing) metadata for datasets. It also helps recording information about the computational environment to increase reproducability.

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("dataobservatory-eu/dataobservatory")

The dataset Class

The dataset S3 class is an extension of the data frame and tibble class. It has some important metadata attributes that facilitate the automated documentation of the dataset. Furthermore, it has an adequate print and summary method.

library(dataobservatory)
data("small_population")
small_population_dataset <- dataset (x= small_population,
                                     dataset_code = "small_population_total",
                                     dataset_title = "Population of Small European Countries",
                                     freq = "A",
                                     unit = "NR",
                                     unit_name = "number")

small_population_dataset

summary(small_population_dataset)

small_population_datacite <- datacite_dataset(
  dataset = small_population_dataset,
  Subject = "Demography",
  Creator = "Joe, Doe")

Descriptive Metadata

The datacite class (see ?datacite()) is a modification of a data frame (tibble) object, and it creates all the mandatory and recommended fields of the DataCite metadata schema for a dataset. It also covers all the properties in the more general Dublin Core standard, but in some cases, the property name is different (and follows the DataCite naming convention.)

The descriptive metadata can be added with the datacite() constructor (see: ?datacite ) or the datacite_dataset() helper function. or read the DataCite Descriptive Metadata vignette article.

The datacite class can automatically connected to many scientific repositories, including Zenodo. In later versions, this will enable the user to upload the new created dataset (version) and receive a digital object identifier (version), or DOI(version) for the dataset.

See more about the metadata concepts applied in the FAIR Data and the Added Value of Rich Metadata chatper of the Automated Observatory Contributors’ Handbook.

print(small_population_datacite)

Administrative Metadata

The statistical processing information can be added with the not fully implemented codebook class. Read the The codebook Class vignette article.

The codebook S3 class (not yet fully documented and does not have yet and independent constructor) records the statistical processing metadata of a dataset.

It contains a full codebook following SDMX statistical metadata codelist standards, furthermore, it records the Session Information of all processing steps, and adds to the descriptive metadata the R packages or software code that generated the results.

For example, the annual observations follow the SDMX Code List for Frequency 2.1 (CL_FREQ)) definition, and they can be translated to the ISO 8106 time metadata standard, too.

add_frequency("A", "list")

add_sessioninfo()

Contributor Code of Conduct

Please note that the dataobservatory project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

dataobservatory-eu/dataobservatory documentation built on Jan. 7, 2022, 8:55 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com