knitr::opts_chunk$set(echo = FALSE) library(magrittr)
\begin{picture}(0,0) \put(280,0){\includegraphics[width=1cm]{fig/cbs}} \end{picture}
Stepping in for Greg Warnes, coauthor
Daff is a diff for
data.frame
s
diff_data
, differs_from
write_diff
, read_diff
patch_data
, merge_data
render_diff
And now for the long version...
diff
diff
checks lines:daff
\begin{picture}(0,0) \put(250,-70){\includegraphics[width=2cm]{fig/utility.png}} \end{picture}
daff
compares records and columns: \begin{picture}(0,0) \put(120,-60){\includegraphics[width=5cm]{fig/why.png}} \end{picture}
data.frame
.\begin{picture}(0,0) \put(120,-10){\includegraphics[width=2cm]{fig/update}} \end{picture}
You get an updated raw data file: what are the changes?
\begin{picture}(0,80) \put(120,0){\includegraphics[height=0.3\textheight]{fig/blackbox}} \end{picture} - Compare the input and output - Make the manual step reproducible: all process steps can be re-executed:
- data + changes = new data - in `diff` parlor: version1 + patch = version2
daff
detects the following changes:
type change of a column (partially)
daff
supports it, but highlighter format notdiff_data
: value was changedlibrary(daff) x <- data.frame(A=1, B= 1) x_changed <- data.frame(A=1, B=100) patch <- diff_data(x, x_changed) print(patch)
patch_data
: apply the changex
patch_data(x, patch)
replay` the change on original data:
diff_data
: row was addedx <- data.frame(A=1 , B=1) x_changed <- data.frame(A=1:2, B=1:2) diff_data(x,x_changed)
diff_data
: row was deletedx <- data.frame(A=1:2, B=1:2) x_changed <- data.frame(A=1 , B=1) diff_data(x,x_changed)
diff_data
: column was addedx <- data.frame(A=1, B=1) x_changed <- data.frame(A=1, B=1, C=1) diff_data(x,x_changed)
diff_data
: column was removedx <- data.frame(A=1, B=1, C=1) x_changed <- data.frame(A=1, B=1) diff_data(x,x_changed)
diff_data( data_ref, data , always_show_header = TRUE , always_show_order = FALSE , columns_to_ignore = c() , count_like_a_spreadsheet = TRUE , ids = c() , ignore_whitespace = FALSE , never_show_order = FALSE , ordered = TRUE , ... )
differs_from
diff_data
x_changed %>% differs_from(x) # same as diff_data(x, x_changed)
data.frame
s from a common parent.x <- data.frame(A = 1, B= 1) # two changes were made in parallel x_a <- data.frame(A = 100, B= 1) x_b <- data.frame(A = 1, B=100) merge_data(x, x_a, x_b)
x <- data.frame(A = 1, B = 1) x_changed <- data.frame(A = 1, B = 100) # write diff to disk diff_data(x, x_changed) %>% write_diff("diff.csv") # and read it again from disk read_diff("diff.csv") %>% patch_data(x, .)
x <- data.frame(A = 1:2, B = 1:2) x_changed <- data.frame( B = 2 , C = 1) x_changed %>% differs_from(x) %>% render_diff(use.DataTable=FALSE)
\begin{center} \includegraphics[width=7cm]{fig/daff.png} \end{center}
daff.js
, by Paul Fitzpatrick (@fitzyfitzyfitzy).Uses R package V8
to run daff.js
, by Jeroen Ooms(@opencpu):
V8
any js library can be run from R!diffobj
Other R library:
diffobj
: very good general purpose diff for all R objects.diffCsv
function, but more limited then daff
.daff
specialized in data.frame
:id
columns.\Large{Thank you for your attention!}
Interested?
install.packages("daff")`
or visit:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.