suppressPackageStartupMessages({ library(kableExtra) library(rtrackr) library(networkD3) })
rtrackr
provides data logging for every record in a dataset throughout the processing chain. In most cases, when records are altered or one record is divided to multiple records, rtrackr
will simply assign a new trackr id and log changes when a record is updated.
When data is summarised, on the other hand (multiple records become a single record), rtrackr
needs to record the trackr_ids
of all parent records. trackr_summarise()
provides a convenient way to summarise data without losing information in the trackr_id
column.
trackr_summarise()
works by combining all parent ids into one row, separated by a ", ". The same operation would work for combining records manually outside of R.
We will use a simple workflow To demonstrate the use of trackr_summarise()
in a data processing chain. Continuing from getting started, we will create a new dataset, and log a new processing timepoint with trackr_new()
.
trackr_dir <- '~/Documents/trackr_dir' df <- data.frame(a = c('a', 'b', 'c'), b = c(1, 2, 3)) df <- trackr_new(df, trackr_dir = trackr_dir, suppress_success = TRUE)
Details
kable(df) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Now, we will bind the dataset to itself, and make a change to one version.
df <- rbind(df, df %>% dplyr::mutate(b = b + 1)) df <- trackr_timepoint(df, trackr_dir = trackr_dir, timepoint_message = 'Merged dataframes', suppress_success = TRUE)
Details
kable(df) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
trackr_summarise
is a simple wrapper around dplyr::summarise
and accepts the same arguments.
df <- df %>% dplyr::group_by(a) %>% trackr_summarise(n = dplyr::n())
Details
kable(df) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Now, we can log a new timepoint with trackr_timepoint()
.
df <- trackr_timepoint(df, trackr_dir = trackr_dir, timepoint_message = 'Summarised dataframes', suppress_success = TRUE)
Details
kable(df) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
We will make and log one more change, to better visualize the effect of the summarise operation.
df <- df %>% dplyr::mutate(n = n + 100) df <- trackr_timepoint(df, trackr_dir = trackr_dir, timepoint_message = 'Added 100', suppress_success = TRUE)
Details
kable(df) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
To visualize this operation on one record, we create a trackr_lineage
and trackr_network
. See getting started for more information.
target_id <- df$trackr_id[1] trackr_lineage(target_id, trackr_dir) lineage_fn <- paste0(trackr_dir, '/', target_id, '_lineage.json') trackr_network(lineage_fn)
clean_trackr_dir(trackr_dir)
Article by Hamish Gibbs r Sys.time()
. To report a problem with this package, please create an issue on GitHub.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.