knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of tricordr is to integrate process provenance and instrumentation to pipelines of tidyverse primatives
You can install the released version of tricordr from CRAN with:
install.packages("tricordr")
And the development version from GitHub with:
# install.packages("devtools") devtools::install_github("bvancil/tricordr")
Warning: None of this actually works yet.
Eventually, we want full provenance, but this is sort of what we're going for.
We'll start by creating some test data.
library('dplyr') library('tibble') library('tricordr') # Here's some test data data_size <- 100L test_data <- tibble::tibble( x = base::sample(c(0L, 1L), data_size, replace = TRUE), y = base::sample(base::seq(0L, 2L), data_size, replace = TRUE) )
In our data pipeline, we want to track what happens to our data. For instance, we might want to add another variable and transform the others.
final_data <- test_data %>% dplyr::mutate(z = x * y, y1 = y, y = x, x = 2L - y1)
What happened? It will be tricky to figure out later.
Instead, we can use tricordr
to decorate the
dplyr::mutate
function so that we keep track.
test_provenance <- tricordr::Provenance$new() test <- test_provenance$wrap_operations() # Now we change `dplyr::mutate` to `test$mutate` final_data <- test_data %>% test$mutate(z = x * y, y1 = y, y = x, x = 2L - y1) print(test_provenance)
We can create difference provenances for different pipelines and combine them later.
Please note that the 'tricordr' project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.