
The dataset package helps you create semantically rich,
machine-readable, and interoperable datasets in R. It introduces
S3 classes that extend data frames, vectors, and bibliographic entries
with formal metadata structures inspired by:
The goal is to preserve metadata when reusing statistical and repository datasets, improve interoperability, and make it easy to turn tidy data frames into web-ready, publishable datasets that comply with ISO and W3C standards.
You can install the latest released version of dataset from
CRAN with:
install.packages("dataset")
To install the development version from GitHub with pak or remotes:
# install.packages("pak")
pak::pak("dataobservatory-eu/dataset")
# install.packages("remotes")
remotes::install_github("dataobservatory-eu/dataset")
library(dataset)
df <- dataset_df(
country = defined(
c("AD", "LI"),
label = "Country",
namespace = "https://www.geonames.org/countries/$1/"
),
gdp = defined(c(3897, 7365),
label = "GDP",
unit = "million euros"
),
dataset_bibentry = dublincore(
title = "GDP Dataset",
creator = person("Jane", "Doe", role = "aut"),
publisher = "Small Repository"
)
)
print(df)
#> Doe (2025): GDP Dataset [dataset]
#> rowid country gdp
#> <chr> <chr> <dbl>
#> 1 obs1 AD 3897
#> 2 obs2 LI 7365
Export as RDF triples:
.smaller .table { font-size: 11px; } .smaller pre, .smaller code { font-size: 11px; line-height: 1.2; }dataset_to_triples(df, format = "nt")
Retain automatically recorded provenance:
provenance(df)
We welcome contributions and discussion!
This project follows the rOpenSci Code of Conduct. By participating, you are expected to uphold these guidelines.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.