knitr::opts_chunk$set(collapse = TRUE, comment = "#>") library(rfair)
This vignette describes what rfair measures and how, in enough detail to
interpret and reproduce its scores. For a quick tour see
vignette("rfair"); for the reuse/sensitivity extensions see
vignette("beyond-fuji").
The FAIR principles (Wilkinson et al. 2016) state that research data should be Findable, Accessible, Interoperable, and Reusable. They are aspirational; to assess a real data object you need measurable indicators.
The FAIRsFAIR project turned the principles into a concrete, testable metric set, and the F-UJI tool (Devaraju & Huber, PANGAEA) implemented an automated assessment service for them. F-UJI is a Python web service: you send it a persistent identifier (PID) and it returns per-metric scores.
rfair is a native R reimplementation of the F-UJI metrics (version 0.8).
It performs the whole assessment in R, with no external server, so assessments
are scriptable, reproducible, and embeddable in R pipelines. The original
rfair package (v1) was only an HTTP client for an F-UJI server; this version
(v2) is the engine itself.
A single call to assess_fair() runs this pipeline:
identifier
│ id_parse() scheme detection + normalization + resolver URL
▼
resolution content-negotiated GET, follow redirects -> landing page
│ resolve_landing_page()
▼
harvesting a sequence of collectors, in priority order:
│ collect_html_meta() embedded JSON-LD (schema.org), Dublin Core,
│ OpenGraph, Highwire meta tags
│ collect_signposting() HTTP Link header + <link rel> typed links
│ collect_datacite() DataCite JSON via content negotiation
│ collect_xml() DataCite XML, Dublin Core, MODS, EML, ISO19139
│ collect_rdf() JSON-LD (native) and Turtle/RDF-XML (via rdflib)
│ collect_github() GitHub repository + codemeta.json + CITATION.cff
│ harvest_data() HEAD on data links for MIME type and size
▼
mapping + merging each source is mapped to one reference schema and
│ merge_metadata() merged (first-non-empty for scalars; union for
│ lists; longer-but-similar replacement)
▼
evaluation one evaluator per metric inspects the merged metadata
│ run_evaluators() and the resolved identifier, scoring each test
▼
scoring per-test scores -> per-metric -> F/A/I/R -> overall
│ get_assessment_summary()
▼
fair_assessment tidy S3 object (print / summary / as.data.frame /
as_fuji_json / as_rdf)
id_parse() recognizes DOIs, Handles, ARKs, URNs, UUIDs, identifiers.org
PIDs, w3id, and plain URLs, normalizes them, and constructs a resolver URL.
Persistence is inferred from the scheme.
id_parse("https://doi.org/10.5281/zenodo.8347772")[c("preferred_schema", "is_persistent", "identifier_url")]
Different repositories expose metadata in different ways. rfair asks for several
representations of the same object via HTTP content negotiation (the Accept
header) and scrapes the landing page, then merges everything into a single
reference schema (~30 elements: creator, title, publisher,
publication_date, license, access_level, object_content_identifier,
related_resources, ...). When two sources disagree, scalars keep the first
non-empty value (replaced only by a longer, sufficiently-similar string), and
list-valued elements are unioned.
Metrics are data-driven: their definitions, tests, scores, and maturity levels come from the bundled FAIRsFAIR YAML, not from hard-coded R logic.
rfair_metric_versions() # bundled metric versions # v0.8 has 17 metrics across F/A/I/R (one row each): nrow(as.data.frame(assess_fair("https://doi.org/10.5281/zenodo.8347772", resolve = FALSE)))
Each metric has one or more tests. A test contributes a score and a maturity level (a CMMI level 0–3: incomplete, initial, moderate, advanced) when it passes. Metrics use one of two scoring mechanisms:
The criterium engine (criterium_engine.R) builds each metric's result from the
YAML and lets evaluators mark tests passed; as_fuji_json() then emits a payload
matching the upstream F-UJI FAIRResults schema.
| | metric | what rfair checks | |---|---|---| | F | F1-01MD | identifier follows a unique scheme (URI/URN/UUID/HASH/PID) | | | F1-02MD | identifier is persistent and registered (resolves) | | | F2-01M | core descriptive metadata present (creator, title, id, date, publisher, type, summary, keywords) | | | F3-01M | metadata links to the downloadable data content | | | F4-01M | metadata offered in a search-engine-ingestible way (embedded JSON-LD / meta tags) | | A | A1-01M | access level / rights are stated in metadata | | | A1-02MD | metadata and data are retrievable via their identifiers | | | A1.1-01MD | identifiers use a standardized communication protocol (http/https/ftp) | | | A1.2-01MD | the protocol supports authentication where needed | | I | I1-01M | metadata uses a formal, machine-readable representation (JSON-LD/RDF/XML) | | | I2-01M | metadata uses terms from registered semantic vocabularies | | | I3-01M | qualified references to related entities (with relation types) | | R | R1-01M | metadata describes the data content (type, format/size) | | | R1.1-01M | a machine-readable license is present and SPDX/CC-recognized | | | R1.2-01M | provenance information (creators, dates, contributors) | | | R1.3-01M | a community-/discipline-endorsed metadata standard is used | | | R1.3-02D | data is in a recommended (scientific/open/long-term) file format |
The score for a category is the sum of earned over total across its metrics; the overall FAIR score is the sum across all 17, and the maturity is the (clamped) mean of the per-category maturities.
# the canonical principle definitions these metrics map to fair_principles("I")[, c("id", "definition")]
For software objects, rfair also bundles the FRSM (FAIR for Research Software)
metric set; select it with metric_version = "0.7_software". The GitHub
harvester inspects the repository file tree for signals (a license file, tests,
CI workflows, dependency manifests, a registry DOI, a release version,
contributors) and the 17 FRSM evaluators score from them. FRSM scoring is
heuristic and not yet validated against an upstream software-FAIR reference.
Because rfair reimplements an existing scoring engine, it includes a
non-CRAN conformance harness. tests/conformance/run.R runs identifiers through
both rfair and a locally run, version-matched F-UJI server and compares
per-metric earned scores. A manual run on 2026-06-16 against F-UJI 4.0.0
(metrics v0.8) measured 94.1% on a Zenodo DOI (16/17 metrics exact) and
85.3% across PANGAEA and Dryad; the consistent divergence was the data
file-format metric (F-UJI uses Tika content detection where rfair uses an HTTP
HEAD). This reference-server comparison is not reproduced by CI yet. A separate
harness (tests/conformance/parity.R) compares the R engine with the browser
TypeScript engine on registry-derivable metrics after the webapp branch is
checked out alongside the package.
rfair adds checks that automated FAIR tools usually miss, motivated by peer
review of a COVID-19 FAIR study: license reusability (not just presence) with
the (Re)usable Data Project taxonomy, controlled-access/sensitive-data flagging,
identifier hygiene, and the FAIR-TLC (Traceable, Licensed, Connected)
extension. See vignette("beyond-fuji").
as_rdf() Turtle output need the optional
rdflib package (system librdf); without it those paths are skipped.Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.