describeWorkflow | R Documentation |
Add information about the relationships among DataObject members
in a DataPackage, retrospectively describing the way in which derived data were
created from source data using a processing program such as an R script. These provenance
relationships allow the derived data to be understood sufficiently for users
to be able to reproduce the computations that created the derived data, and to
trace lineage of the derived data objects. The method describeWorkflow
will add provenance relationships between a script that was executed, the files
that it used as sources, and the derived files that it generated.
describeWorkflow(x, ...) ## S4 method for signature 'DataPackage' describeWorkflow( x, sources = list(), program = NA_character_, derivations = list(), insertDerivations = TRUE, ... )
x |
The |
... |
Additional parameters |
sources |
A list of DataObjects for files that were read by the program. Alternatively, a list of DataObject identifiers can be specified as a list of character strings. |
program |
The DataObject created for the program such as an R script. Alternatively the DataObject identifier can be specified. |
derivations |
A list of DataObjects for files that were generated by the program. Alternatively, a list of DataObject identifiers can be specified as a list of character strings. |
insertDerivations |
A |
This method operates on a DataPackage that has had DataObjects for the script, data sources (inputs), and data derivations (outputs) previously added to it, or can reference identifiers for objects that exist in other DataPackage instances. This allows a user to create a standalone package that contains all of its source, script, and derived data, or a set of data packages that are chained together via a set of derivation relationships between the members of those packages.
Provenance relationships are described following the the ProvONE data model, which can be viewed at https://purl.dataone.org/provone-v1-dev. In particular, the following relationships are inserted (among others):
prov:used
indicates which source data was used by a program execution
prov:generatedBy
indicates which derived data was created by a program execution
prov:wasDerivedFrom
indicates the source data from which derived data were created using the program
The R 'recordr' package for run-time recording of provenance relationships.
library(datapack) dp <- new("DataPackage") # Add the script to the DataPackage progFile <- system.file("./extdata/pkg-example/logit-regression-example.R", package="datapack") progObj <- new("DataObject", format="application/R", filename=progFile) dp <- addMember(dp, progObj) # Add a script input to the DataPackage inFile <- system.file("./extdata/pkg-example/binary.csv", package="datapack") inObj <- new("DataObject", format="text/csv", filename=inFile) dp <- addMember(dp, inObj) # Add a script output to the DataPackage outFile <- system.file("./extdata/pkg-example/gre-predicted.png", package="datapack") outObj <- new("DataObject", format="image/png", file=outFile) dp <- addMember(dp, outObj) # Add the provenenace relationshps, linking the input and output to the script execution # Note: 'sources' and 'derivations' can also be lists of "DataObjects" or "DataObject' identifiers dp <- describeWorkflow(dp, sources = inObj, program = progObj, derivations = outObj) # View the results utils::head(getRelationships(dp))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.