Daff

knitr::opts_chunk$set(echo = FALSE)
library(magrittr)

Who am I (Edwin)

\begin{picture}(0,0) \put(280,0){\includegraphics[width=1cm]{fig/cbs}} \end{picture}

Stepping in for Greg Warnes, coauthor

What is daff?

Short version

Daff is a diff for data.frames

And now for the long version...

Diff?

diff

daff

\begin{picture}(0,0) \put(250,-70){\includegraphics[width=2cm]{fig/utility.png}} \end{picture}

Why o why?

\begin{picture}(0,0) \put(120,-60){\includegraphics[width=5cm]{fig/why.png}} \end{picture}

Use case: data update

\begin{picture}(0,0) \put(120,-10){\includegraphics[width=2cm]{fig/update}} \end{picture}

Raw data update

You get an updated raw data file: what are the changes?

Use case: manual editing

Manual editing

\begin{picture}(0,80) \put(120,0){\includegraphics[height=0.3\textheight]{fig/blackbox}} \end{picture} - Compare the input and output - Make the manual step reproducible: all process steps can be re-executed:

- data + changes = new data
- in `diff` parlor: version1 + patch = version2

Daff protocol

Highlighter diff format

Detecting changes

daff detects the following changes:

diff_data: value was changed

library(daff)
x         <- data.frame(A=1, B=  1)
x_changed <- data.frame(A=1, B=100)
patch <- diff_data(x, x_changed)
print(patch)

patch_data: apply the change

x
patch_data(x, patch)

replay` the change on original data:

diff_data: row was added

x         <- data.frame(A=1  , B=1)
x_changed <- data.frame(A=1:2, B=1:2)
diff_data(x,x_changed)

diff_data: row was deleted

x         <- data.frame(A=1:2, B=1:2)
x_changed <- data.frame(A=1  , B=1)
diff_data(x,x_changed)

diff_data: column was added

x         <- data.frame(A=1, B=1)
x_changed <- data.frame(A=1, B=1, C=1)
diff_data(x,x_changed)

diff_data: column was removed

x         <- data.frame(A=1, B=1, C=1)
x_changed <- data.frame(A=1, B=1)
diff_data(x,x_changed)

diff_data options

diff_data( data_ref, data
         , always_show_header       = TRUE
         , always_show_order        = FALSE
         , columns_to_ignore        = c()
         , count_like_a_spreadsheet = TRUE
         , ids                      = c()
         , ignore_whitespace        = FALSE
         , never_show_order         = FALSE
         , ordered                  = TRUE
         , ... 
         )

differs_from

x_changed %>% 
  differs_from(x)

# same as

diff_data(x, x_changed)

Merging

x   <- data.frame(A =   1, B=  1)
# two changes were made in parallel
x_a <- data.frame(A = 100, B=  1)
x_b <- data.frame(A =   1, B=100)
merge_data(x, x_a, x_b)

Reading and writing table diffs

x         <- data.frame(A = 1, B =   1)
x_changed <- data.frame(A = 1, B = 100)
# write diff to disk
diff_data(x, x_changed) %>% 
  write_diff("diff.csv")

# and read it again from disk
read_diff("diff.csv") %>% 
  patch_data(x, .)

Render diff

x         <- data.frame(A = 1:2, B = 1:2)
x_changed <- data.frame(         B = 2  , C = 1)

x_changed %>% 
  differs_from(x) %>% 
  render_diff(use.DataTable=FALSE)

\begin{center} \includegraphics[width=7cm]{fig/daff.png} \end{center}

Implementation

Other R libs

diffobj

Other R library:

\Large{Thank you for your attention!}

Interested?

install.packages("daff")`

or visit:



Try the daff package in your browser

Any scripts or data that you put into this service are public.

daff documentation built on Oct. 9, 2023, 1:06 a.m.