write_prov: Write a provenance trace into JSON-LD

View source: R/prov.R

write_provR Documentation

Write a provenance trace into JSON-LD

Description

Write a provenance trace into JSON-LD

Usage

write_prov(
  data_in = NULL,
  code = NULL,
  data_out = NULL,
  meta = NULL,
  creator = NULL,
  title = NULL,
  description = NULL,
  issued = as.character(Sys.Date()),
  license = "https://creativecommons.org/publicdomain/zero/1.0/legalcode",
  provdb = "prov.json",
  append = TRUE,
  schema = c("http://schema.org", "http://www.w3.org/ns/dcat"),
  embed_actions = is.null(code),
  ...
)

Arguments

data_in

path or URI for input data

code

path or URI for code

data_out

path or URI to output data

meta

path or URI to metadata describing the workflow

creator

URI, list node, or text for creator

title

Dataset title, character string

description

Dataset description, character string

issued

publication date, as Date or character object

license

URL to a copyright license

provdb

path to output JSON file, default "prov.json"

append

Should we append to existing json or overwrite it?

schema

Use schema.org or DCAT2 schema? See details.

embed_actions

should we incldue schema:Action to create?

...

additional named elements passed to Dataset

Details

If creator, title, and description are all empty, will serialize only a graph of distribution (data download) elements, not a Dataset.

Additional elements passed through ... must be explicitly namespaced, e.g. dcat:version, when using DCAT2 schema. When using schema.org, elements must be in schema.org namespace.

Provenance can be expressed in (purely) schema.org or as DCAT2 (includes terms from DCTERMS, PROV, DCAT2, CITO ontologies). The latter is more expressive in terms of provenance. Also note DCAT2 but not schema.org can explicitly encode compression and metadata file relationships.

Examples

 
## Use temp files for illustration only
provdb <- tempfile(fileext = ".json")
input_data <- tempfile(fileext = ".csv")
output_data <- tempfile(fileext = ".csv")
code <- tempfile(fileext = ".R")

## A minimal workflow: 
write.csv(mtcars, input_data)
out <- lm(mpg ~ disp, data = mtcars)
write.csv(out$coefficients, output_data)

# really this would already exist...
writeLines("out <- lm(mpg ~ disp, data = mtcars)", code)

## And here we go: 
write_prov(input_data, code, output_data, provdb = provdb,  
           append= FALSE)
 
## Include a title to group these into a Dataset:
write_prov(input_data, code, output_data, provdb = provdb,
           title = "example dataset with provenance",  append= FALSE)
           

cboettig/prov documentation built on Feb. 12, 2023, 5:54 p.m.