Getting started with rocrateR

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(rocrateR)

Introduction

Reproducible research requires more than sharing files. We also need structured metadata describing:

What is an RO-Crate?

An RO-Crate is:

The metadata describes all files and their relationships using a graph model.

RO-Crate Structure

Example:

my_crate/
├── ro-crate-metadata.json
├── data/
│   └── results.csv
└── analysis.R

1. Functions Overview

| Function | Purpose | |-----------|----------| | rocrate() | Create an empty or initialized RO-Crate | | entity() | Define a new entity (Person, Dataset, etc.) | | add_entity() / add_entities() | Add entities to a crate. Note that add_entities() is now deprecated and add_entity() is preferred. | | get_entity() | Retrieve entities by @id or @type | | remove_entity() / remove_entities() | Remove one or more entities. Note that remove_entities() is now deprecated and remove_entity() is preferred. | | load_rocrate() | Higher level function that loads an RO-Crate from metadata file, crate directory or BagIt archive | | write_rocrate() | Save RO-Crate to disk | | bag_rocrate() / is_rocrate_bag() / unbag_rocrate() | Bagging and unbagging RO-Crates | | validate_rocrate() | Validate RO-Crate and generate report |

2. First RO-Crate

The following command creates an RO-Crate Metadata descriptor (ro-crate-metadata.json). This should be stored inside the root (./) of your RO-Crate.

# library(rocrateR)
my_first_ro_crate <- rocrateR::rocrate()

This object is a list with the basic components of an RO-Crate. It can be visualised in the console as follows:

my_first_ro_crate

This object can be saved to disk using the following command:

my_first_ro_crate |>
  rocrateR::write_rocrate("/path/to/ro-crate/ro-crate-metadata.json")

For example, using a temporary directory:

tmp <- file.path(tempdir(), "ro-crate-metadata.json")
my_first_ro_crate |>
  rocrateR::write_rocrate(tmp)

# load lines / flat file
readLines(tmp)

# delete temporary file
unlink(tmp)

3. Including additional entities

In the previous section we created a very basic RO-Crate with the rocrateR::rocrate() function; however, you are likely to include additional entities in your RO-Crate. Entities must contain at least two components @id and @type (see https://w3id.org/ro/crate/1.2/ for details).

For example, a contextual entity can be defined as follows:

# create entity for an organisation
organisation_uol <- rocrateR::entity(
  id = "https://ror.org/04xs57h96",
  type = "Organization",
  name = "University of Liverpool",
  url = "http://www.liv.ac.uk"
)

# create an entity for a person
person_rvd <- rocrateR::entity(
  id = "https://orcid.org/0000-0001-5036-8661",
  type = "Person",
  name = "Roberto Villegas-Diaz"
)

These entities can be attached to an RO-Crate using the rocrateR::add_entity() function:

my_second_ro_crate <- rocrateR::rocrate() |>
  rocrateR::add_entity(person_rvd) |>
  rocrateR::add_entity_value(
    id = "./", 
    key = "author", 
    value = list(`@id` = person_rvd$`@id`)
  ) |>
  rocrateR::add_entity(organisation_uol) |>
  rocrateR::add_entity_value(
    id = "https://orcid.org/0000-0001-5036-8661",
    key = "affiliation",
    value = list(`@id` = organisation_uol$`@id`)
  )

Alternatively, the same result can be achieved with the following code:

my_second_ro_crate <- rocrateR::rocrate(person_rvd, organisation_uol) |>
  rocrateR::add_entity_value(id = "./", key = "author", value = list(`@id` = person_rvd$`@id`))
my_second_ro_crate

4. Wrangle RO-Crate

Previously, we covered how to include additional entities, other valid operations are to extract (rocrateR::get_entity()) and remove (rocrateR::remove_entities()).

4.1. Set up

# create basic RO-Crate
basic_ro_crate <- rocrateR::rocrate()

# create some entities for a project and datasets
dataset_entities <- seq_len(2) |>
  lapply(\(x) rocrateR::entity(x, type = "Dataset", name = paste0("Data ", x)))
project_entity <- rocrateR::entity(
  "#proj101", 
  type = "Project", 
  name = "Project 101",
  hasPart = dataset_entities |>
      lapply(\(x) list(`@id` = x[["@id"]]))
  )

# add project and entities to the RO-Crate
basic_ro_crate <- basic_ro_crate |>
  rocrateR::add_entity(project_entity) |>
  # note that here we are using `rocrateR::add_entities` and `rocrateR::add_entity`
  rocrateR::add_entities(dataset_entities)

basic_ro_crate

4.2. Extract entity

We can extract entities via the @id, @type or both:

4.2.1. Extract using @id

basic_ro_crate_project <- basic_ro_crate |>
  rocrateR::get_entity(id = "#proj101")

basic_ro_crate_project

4.2.2. Extract using @type

basic_ro_crate_datasets <- basic_ro_crate |>
  rocrateR::get_entity(type = "Dataset")

basic_ro_crate_datasets

4.2.3. Extract using @id and @type

basic_ro_crate_dataset_root <- basic_ro_crate |>
  rocrateR::get_entity(id = "./", type = "Dataset")

basic_ro_crate_dataset_root

4.3. Remove entity

Similarly, we can remove entities from an RO-Crate:

4.3.1. Remove using scalar @id

basic_ro_crate_alt <- basic_ro_crate |>
  rocrateR::remove_entity("#proj101")

4.3.2. Remove using entity object

basic_ro_crate_alt <- basic_ro_crate |>
  rocrateR::remove_entity(project_entity)

4.3.3. Remove multiple entities

basic_ro_crate_alt <- basic_ro_crate |>
  rocrateR::remove_entity(dataset_entities)

5. Create an RO-Crate Bag

Here we will explore the BagIt file packaging format, which is the recommended to use for bagging RO-Crates. BagIt is described in RFC 8493:

[BagIt is] … a set of hierarchical file layout conventions for storage and transfer of arbitrary digital content. A "bag" has just enough structure to enclose descriptive metadata "tags" and a file "payload" but does not require knowledge of the payload’s internal semantics. This BagIt format is suitable for reliable storage and transfer.

In this package, the function rocrateR::bag_rocrate will take either a path pointing to the root of an RO-Crate (must have at least an RO-Crate metadata descriptor file, ro-crate-metadata.json) or an RO-Crate object created with rocrateR::rocrate (and alternatives), as shown in step 1.

For more details, run the following command:

?rocrateR::bag_rocrate

5.1. rocrateR::bag_rocrate()

Here we will create an RO-Crate bag inside temporary directory:

# create basic RO-Crate
basic_ro_crate <- rocrateR::rocrate()

# create temporary directory
tmp_dir <- file.path(tempdir(), paste0("rocrate-", digest::digest(basename(tempfile()))))
dir.create(tmp_dir, showWarnings = FALSE, recursive = TRUE)

# then, we can create the RO-Crate bag
path_to_rocrate_bag <- basic_ro_crate |>
  rocrateR::bag_rocrate(path = tmp_dir)

5.2. rocrateR::is_rocrate_bag()

We can use the function rocrateR::is_rocrate_bag() to verify that a given path points to a ZIP file or a directory with a valid RO-Crate bag. The expected files are

path_to_rocrate_bag |>
  rocrateR::is_rocrate_bag()

And then, the RO-Crate can be displayed

path_to_rocrate_bag |>
  rocrateR::load_rocrate()

5.3. rocrateR::unbag_rocrate()

We can explore the contents of the RO-Crate bag with the following commands:

# list files without unzipping
unzip(path_to_rocrate_bag, list = TRUE)
# extract files in temporary directory
path_to_rocrate_bag_contents <- path_to_rocrate_bag |>
  rocrateR::unbag_rocrate(output = file.path(tmp_dir, "ROC"))

# create tree with the files
fs::dir_tree(path_to_rocrate_bag_contents)
# delete temporary directory
unlink(tmp_dir, recursive = TRUE, force = TRUE)

6. Validation

Advanced validation using the Python rocrate-validator is optional and requires {reticulate}.

Appendix

A1. Advanced Validation (experimental)

As you develop your RO-Crates, you might want to validate them. There are few validators online (some of which can be found at https://www.researchobject.org/ro-crate/tools), here we will explore the Python package rocrate-validator. For installation details, please visit https://github.com/crs4/rocrate-validator.

r knitr::asis_output("\U26A0") The validation workflow depends on Python’s rocrate-validator. Ensure you have a working Python installation and {reticulate} configured correctly (reticulate::py_config()). On Windows, you may need to restart R after installation.

A1.1. Install {reticulate}

pak::pkg_install("reticulate")

A1.2. Install rocrate-validator

reticulate::py_install("roc-validator", env = "rocrateR")

A1.3. Create example RO-Crate and validate it

basic_ro_crate <- rocrateR::rocrate()

# store crate inside temporary directory
tmp <- file.path(tempdir(), "ro-crate-metadata.json")
basic_ro_crate |>
  rocrateR::write_rocrate(tmp)
# wrap crate into zip file (expected by validator)
tmp_zip <- paste(tmp, ".zip")
zip(tmp_zip, tmp)

# validate (note the name of the module: rocrate_validator)
reticulate::use_virtualenv("rocrateR")
rocrate_validator <- reticulate::import("rocrate_validator")
status <- rocrate_validator$utils$validate_rocrate_uri(tmp_zip)

if (status) {
  message("RO-Crate is valid!")
} else {
  message("RO-Crate is invalid!")
}

# delete temporary files
unlink(tmp)
unlink(tmp_zip)


Try the rocrateR package in your browser

Any scripts or data that you put into this service are public.

rocrateR documentation built on April 9, 2026, 1:06 a.m.