How rfair works: methodology and architecture
In rfair: Assess the FAIRness of Research Data Objects and Software

knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
library(rfair)

This vignette describes what rfair measures and how, in enough detail to interpret and reproduce its scores. For a quick tour see vignette("rfair"); for the reuse/sensitivity extensions see vignette("beyond-fuji").

1. Background: FAIR, the FAIRsFAIR metrics, and F-UJI

The FAIR principles (Wilkinson et al. 2016) state that research data should be Findable, Accessible, Interoperable, and Reusable. They are aspirational; to assess a real data object you need measurable indicators.

The FAIRsFAIR project turned the principles into a concrete, testable metric set, and the F-UJI tool (Devaraju & Huber, PANGAEA) implemented an automated assessment service for them. F-UJI is a Python web service: you send it a persistent identifier (PID) and it returns per-metric scores.

rfair is a native R reimplementation of the F-UJI metrics (version 0.8). It performs the whole assessment in R, with no external server, so assessments are scriptable, reproducible, and embeddable in R pipelines. The original rfair package (v1) was only an HTTP client for an F-UJI server; this version (v2) is the engine itself.

2. The assessment pipeline

A single call to assess_fair() runs this pipeline:

identifier
   │  id_parse()            scheme detection + normalization + resolver URL
   ▼
resolution                  content-negotiated GET, follow redirects -> landing page
   │  resolve_landing_page()
   ▼
harvesting                  a sequence of collectors, in priority order:
   │   collect_html_meta()      embedded JSON-LD (schema.org), Dublin Core,
   │                            OpenGraph, Highwire meta tags
   │   collect_signposting()    HTTP Link header + <link rel> typed links
   │   collect_datacite()       DataCite JSON via content negotiation
   │   collect_xml()            DataCite XML, Dublin Core, MODS, EML, ISO19139
   │   collect_rdf()            JSON-LD (native) and Turtle/RDF-XML (via rdflib)
   │   collect_github()         GitHub repository + codemeta.json + CITATION.cff
   │   harvest_data()           HEAD on data links for MIME type and size
   ▼
mapping + merging           each source is mapped to one reference schema and
   │  merge_metadata()         merged (first-non-empty for scalars; union for
   │                           lists; longer-but-similar replacement)
   ▼
evaluation                  one evaluator per metric inspects the merged metadata
   │  run_evaluators()         and the resolved identifier, scoring each test
   ▼
scoring                     per-test scores -> per-metric -> F/A/I/R -> overall
   │  get_assessment_summary()
   ▼
fair_assessment             tidy S3 object (print / summary / as.data.frame /
                            as_fuji_json / as_rdf)

Identifier handling

id_parse() recognizes DOIs, Handles, ARKs, URNs, UUIDs, identifiers.org PIDs, w3id, and plain URLs, normalizes them, and constructs a resolver URL. Persistence is inferred from the scheme.

id_parse("https://doi.org/10.5281/zenodo.8347772")[c("preferred_schema", "is_persistent", "identifier_url")]

Harvesting and content negotiation

Different repositories expose metadata in different ways. rfair asks for several representations of the same object via HTTP content negotiation (the Accept header) and scrapes the landing page, then merges everything into a single reference schema (~30 elements: creator, title, publisher, publication_date, license, access_level, object_content_identifier, related_resources, ...). When two sources disagree, scalars keep the first non-empty value (replaced only by a longer, sufficiently-similar string), and list-valued elements are unioned.

The metric model

Metrics are data-driven: their definitions, tests, scores, and maturity levels come from the bundled FAIRsFAIR YAML, not from hard-coded R logic.

rfair_metric_versions()      # bundled metric versions
# v0.8 has 17 metrics across F/A/I/R (one row each):
nrow(as.data.frame(assess_fair("https://doi.org/10.5281/zenodo.8347772", resolve = FALSE)))

Each metric has one or more tests. A test contributes a score and a maturity level (a CMMI level 0–3: incomplete, initial, moderate, advanced) when it passes. Metrics use one of two scoring mechanisms:

cumulative — passed tests' scores add up;
alternative — tests are alternative routes to the same points (the earned score is capped at the metric total).

The criterium engine (criterium_engine.R) builds each metric's result from the YAML and lets evaluators mark tests passed; as_fuji_json() then emits a payload matching the upstream F-UJI FAIRResults schema.

3. What each FAIR category measures (v0.8)

| | metric | what rfair checks | |---|---|---| | F | F1-01MD | identifier follows a unique scheme (URI/URN/UUID/HASH/PID) | | | F1-02MD | identifier is persistent and registered (resolves) | | | F2-01M | core descriptive metadata present (creator, title, id, date, publisher, type, summary, keywords) | | | F3-01M | metadata links to the downloadable data content | | | F4-01M | metadata offered in a search-engine-ingestible way (embedded JSON-LD / meta tags) | | A | A1-01M | access level / rights are stated in metadata | | | A1-02MD | metadata and data are retrievable via their identifiers | | | A1.1-01MD | identifiers use a standardized communication protocol (http/https/ftp) | | | A1.2-01MD | the protocol supports authentication where needed | | I | I1-01M | metadata uses a formal, machine-readable representation (JSON-LD/RDF/XML) | | | I2-01M | metadata uses terms from registered semantic vocabularies | | | I3-01M | qualified references to related entities (with relation types) | | R | R1-01M | metadata describes the data content (type, format/size) | | | R1.1-01M | a machine-readable license is present and SPDX/CC-recognized | | | R1.2-01M | provenance information (creators, dates, contributors) | | | R1.3-01M | a community-/discipline-endorsed metadata standard is used | | | R1.3-02D | data is in a recommended (scientific/open/long-term) file format |

The score for a category is the sum of earned over total across its metrics; the overall FAIR score is the sum across all 17, and the maturity is the (clamped) mean of the per-category maturities.

# the canonical principle definitions these metrics map to
fair_principles("I")[, c("id", "definition")]

4. Software FAIR (FRSM)

For software objects, rfair also bundles the FRSM (FAIR for Research Software) metric set; select it with metric_version = "0.7_software". The GitHub harvester inspects the repository file tree for signals (a license file, tests, CI workflows, dependency manifests, a registry DOI, a release version, contributors) and the 17 FRSM evaluators score from them. FRSM scoring is heuristic and not yet validated against an upstream software-FAIR reference.

5. Fidelity to F-UJI

Because rfair reimplements an existing scoring engine, it includes a non-CRAN conformance harness. tests/conformance/run.R runs identifiers through both rfair and a locally run, version-matched F-UJI server and compares per-metric earned scores. A manual run on 2026-06-16 against F-UJI 4.0.0 (metrics v0.8) measured 94.1% on a Zenodo DOI (16/17 metrics exact) and 85.3% across PANGAEA and Dryad; the consistent divergence was the data file-format metric (F-UJI uses Tika content detection where rfair uses an HTTP HEAD). This reference-server comparison is not reproduced by CI yet. A separate harness (tests/conformance/parity.R) compares the R engine with the browser TypeScript engine on registry-derivable metrics after the webapp branch is checked out alongside the package.

6. Beyond F-UJI

rfair adds checks that automated FAIR tools usually miss, motivated by peer review of a COVID-19 FAIR study: license reusability (not just presence) with the (Re)usable Data Project taxonomy, controlled-access/sensitive-data flagging, identifier hygiene, and the FAIR-TLC (Traceable, Licensed, Connected) extension. See vignette("beyond-fuji").

7. Limitations

The browser app is registry-only (CORS): it cannot harvest landing pages, so some metrics score lower than the R engine.
I2-01M (semantic vocabularies) scores 0 for objects whose metadata uses only default namespaces (dc/schema.org/DataCite) — this matches F-UJI.
RDF Turtle/RDF-XML harvesting and as_rdf() Turtle output need the optional rdflib package (system librdf); without it those paths are skipped.
Live scores depend on the object's current metadata and on third-party services (DataCite, Crossref, GitHub) being reachable.

References

Wilkinson et al. (2016). The FAIR Guiding Principles. Sci Data. \doi{10.1038/sdata.2016.18}
Devaraju & Huber. F-UJI. https://github.com/pangaea-data-publisher/fuji
FAIRsFAIR metrics. \doi{10.5281/zenodo.15045911}
Carbon et al. (2019). (Re)usable data licensing. PLOS ONE. \doi{10.1371/journal.pone.0213090}

Any scripts or data that you put into this service are public.

rfair documentation built on July 1, 2026, 5:07 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

rfair
Assess the FAIRness of Research Data Objects and Software

How rfair works: methodology and architecture
In rfair: Assess the FAIRness of Research Data Objects and Software

1. Background: FAIR, the FAIRsFAIR metrics, and F-UJI

2. The assessment pipeline

Identifier handling

Harvesting and content negotiation

The metric model

3. What each FAIR category measures (v0.8)

4. Software FAIR (FRSM)

5. Fidelity to F-UJI

6. Beyond F-UJI

7. Limitations

References

Try the rfair package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

rfair Assess the FAIRness of Research Data Objects and Software

How rfair works: methodology and architecture In rfair: Assess the FAIRness of Research Data Objects and Software

1. Background: FAIR, the FAIRsFAIR metrics, and F-UJI

2. The assessment pipeline

Identifier handling

Harvesting and content negotiation

The metric model

3. What each FAIR category measures (v0.8)

4. Software FAIR (FRSM)

5. Fidelity to F-UJI

6. Beyond F-UJI

7. Limitations

References

Try the rfair package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

rfair
Assess the FAIRness of Research Data Objects and Software

How rfair works: methodology and architecture
In rfair: Assess the FAIRness of Research Data Objects and Software