knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(rocrateR)
Reproducible research requires more than sharing files. We also need structured metadata describing:
{rocrateR} lets you create and manage RO-Crates directly from R.An RO-Crate is:
The metadata describes all files and their relationships using a graph model.
Example:
my_crate/ ├── ro-crate-metadata.json ├── data/ │ └── results.csv └── analysis.R
| Function | Purpose |
|-----------|----------|
| rocrate() | Create an empty or initialized RO-Crate |
| entity() | Define a new entity (Person, Dataset, etc.) |
| add_entity() / add_entities() | Add entities to a crate. Note that add_entities() is now deprecated and add_entity() is preferred. |
| get_entity() | Retrieve entities by @id or @type |
| remove_entity() / remove_entities() | Remove one or more entities. Note that remove_entities() is now deprecated and remove_entity() is preferred. |
| load_rocrate() | Higher level function that loads an RO-Crate from metadata file, crate directory or BagIt archive |
| write_rocrate() | Save RO-Crate to disk |
| bag_rocrate() / is_rocrate_bag() / unbag_rocrate() | Bagging and unbagging RO-Crates |
| validate_rocrate() | Validate RO-Crate and generate report |
The following command creates an RO-Crate Metadata descriptor (ro-crate-metadata.json). This should be stored inside the root (./) of your RO-Crate.
# library(rocrateR) my_first_ro_crate <- rocrateR::rocrate()
This object is a list with the basic components of an RO-Crate. It can be visualised in the console as follows:
my_first_ro_crate
This object can be saved to disk using the following command:
my_first_ro_crate |> rocrateR::write_rocrate("/path/to/ro-crate/ro-crate-metadata.json")
For example, using a temporary directory:
tmp <- file.path(tempdir(), "ro-crate-metadata.json") my_first_ro_crate |> rocrateR::write_rocrate(tmp) # load lines / flat file readLines(tmp) # delete temporary file unlink(tmp)
In the previous section we created a very basic RO-Crate with the rocrateR::rocrate() function; however, you are likely to include additional entities in your RO-Crate. Entities must contain at least two components @id and @type (see https://w3id.org/ro/crate/1.2/ for details).
For example, a contextual entity can be defined as follows:
# create entity for an organisation organisation_uol <- rocrateR::entity( id = "https://ror.org/04xs57h96", type = "Organization", name = "University of Liverpool", url = "http://www.liv.ac.uk" ) # create an entity for a person person_rvd <- rocrateR::entity( id = "https://orcid.org/0000-0001-5036-8661", type = "Person", name = "Roberto Villegas-Diaz" )
These entities can be attached to an RO-Crate using the rocrateR::add_entity() function:
my_second_ro_crate <- rocrateR::rocrate() |> rocrateR::add_entity(person_rvd) |> rocrateR::add_entity_value( id = "./", key = "author", value = list(`@id` = person_rvd$`@id`) ) |> rocrateR::add_entity(organisation_uol) |> rocrateR::add_entity_value( id = "https://orcid.org/0000-0001-5036-8661", key = "affiliation", value = list(`@id` = organisation_uol$`@id`) )
Alternatively, the same result can be achieved with the following code:
my_second_ro_crate <- rocrateR::rocrate(person_rvd, organisation_uol) |> rocrateR::add_entity_value(id = "./", key = "author", value = list(`@id` = person_rvd$`@id`))
my_second_ro_crate
Previously, we covered how to include additional entities, other valid
operations are to extract (rocrateR::get_entity()) and remove
(rocrateR::remove_entities()).
# create basic RO-Crate basic_ro_crate <- rocrateR::rocrate() # create some entities for a project and datasets dataset_entities <- seq_len(2) |> lapply(\(x) rocrateR::entity(x, type = "Dataset", name = paste0("Data ", x))) project_entity <- rocrateR::entity( "#proj101", type = "Project", name = "Project 101", hasPart = dataset_entities |> lapply(\(x) list(`@id` = x[["@id"]])) ) # add project and entities to the RO-Crate basic_ro_crate <- basic_ro_crate |> rocrateR::add_entity(project_entity) |> # note that here we are using `rocrateR::add_entities` and `rocrateR::add_entity` rocrateR::add_entities(dataset_entities) basic_ro_crate
We can extract entities via the @id, @type or both:
@idbasic_ro_crate_project <- basic_ro_crate |> rocrateR::get_entity(id = "#proj101") basic_ro_crate_project
@typebasic_ro_crate_datasets <- basic_ro_crate |> rocrateR::get_entity(type = "Dataset") basic_ro_crate_datasets
@id and @typebasic_ro_crate_dataset_root <- basic_ro_crate |> rocrateR::get_entity(id = "./", type = "Dataset") basic_ro_crate_dataset_root
Similarly, we can remove entities from an RO-Crate:
@idbasic_ro_crate_alt <- basic_ro_crate |> rocrateR::remove_entity("#proj101")
entity objectbasic_ro_crate_alt <- basic_ro_crate |> rocrateR::remove_entity(project_entity)
basic_ro_crate_alt <- basic_ro_crate |> rocrateR::remove_entity(dataset_entities)
Here we will explore the BagIt file packaging format, which is the recommended to use for bagging RO-Crates. BagIt is described in RFC 8493:
[BagIt is] … a set of hierarchical file layout conventions for storage and transfer of arbitrary digital content. A "bag" has just enough structure to enclose descriptive metadata "tags" and a file "payload" but does not require knowledge of the payload’s internal semantics. This BagIt format is suitable for reliable storage and transfer.
In this package, the function rocrateR::bag_rocrate will take either a path
pointing to the root of an RO-Crate (must have at least an RO-Crate metadata
descriptor file, ro-crate-metadata.json) or an RO-Crate object created with
rocrateR::rocrate (and alternatives), as shown in step 1.
For more details, run the following command:
?rocrateR::bag_rocrate
rocrateR::bag_rocrate()Here we will create an RO-Crate bag inside temporary directory:
# create basic RO-Crate basic_ro_crate <- rocrateR::rocrate() # create temporary directory tmp_dir <- file.path(tempdir(), paste0("rocrate-", digest::digest(basename(tempfile())))) dir.create(tmp_dir, showWarnings = FALSE, recursive = TRUE) # then, we can create the RO-Crate bag path_to_rocrate_bag <- basic_ro_crate |> rocrateR::bag_rocrate(path = tmp_dir)
rocrateR::is_rocrate_bag()We can use the function rocrateR::is_rocrate_bag() to verify that a given path
points to a ZIP file or a directory with a valid RO-Crate bag. The expected
files are
bagit.txt with the BagIt definitiondata directory with payload of the RO-Cratemanifest-[algorithm].txt with the checksum for each file inside the data directory; .path_to_rocrate_bag |> rocrateR::is_rocrate_bag()
And then, the RO-Crate can be displayed
path_to_rocrate_bag |> rocrateR::load_rocrate()
rocrateR::unbag_rocrate()We can explore the contents of the RO-Crate bag with the following commands:
# list files without unzipping unzip(path_to_rocrate_bag, list = TRUE)
# extract files in temporary directory path_to_rocrate_bag_contents <- path_to_rocrate_bag |> rocrateR::unbag_rocrate(output = file.path(tmp_dir, "ROC")) # create tree with the files fs::dir_tree(path_to_rocrate_bag_contents)
# delete temporary directory unlink(tmp_dir, recursive = TRUE, force = TRUE)
Advanced validation using the Python
rocrate-validatoris optional and requires{reticulate}.
As you develop your RO-Crates, you might want to validate them. There are few validators online (some of which can be found at https://www.researchobject.org/ro-crate/tools), here we will explore the Python package rocrate-validator. For installation details, please visit https://github.com/crs4/rocrate-validator.
r knitr::asis_output("\U26A0") The validation workflow depends on Python’s rocrate-validator. Ensure you have a working Python installation and {reticulate} configured correctly (reticulate::py_config()). On Windows, you may need to restart R after installation.
{reticulate}pak::pkg_install("reticulate")
rocrate-validatorreticulate::py_install("roc-validator", env = "rocrateR")
basic_ro_crate <- rocrateR::rocrate() # store crate inside temporary directory tmp <- file.path(tempdir(), "ro-crate-metadata.json") basic_ro_crate |> rocrateR::write_rocrate(tmp) # wrap crate into zip file (expected by validator) tmp_zip <- paste(tmp, ".zip") zip(tmp_zip, tmp) # validate (note the name of the module: rocrate_validator) reticulate::use_virtualenv("rocrateR") rocrate_validator <- reticulate::import("rocrate_validator") status <- rocrate_validator$utils$validate_rocrate_uri(tmp_zip) if (status) { message("RO-Crate is valid!") } else { message("RO-Crate is invalid!") } # delete temporary files unlink(tmp) unlink(tmp_zip)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.