library(objectdiff)

Diffing Arbitrary R objects

The objectdiff package can be used to create a function that allows you to transform one object into another using a minimal amount of memory footprint. This can be very useful when you're performing, for example, a lot of data manipulation on data.frames in very small and modular steps and want to be able to go back to a previous state. Consider the following example.

iris2 <- iris
iris2[1, 1] <- 1
patch1 <- objectdiff(iris, iris2)
iris3 <- iris2
iris3[1, 1] <- 2
patch2 <- objectdiff(iris2, iris3)

The function patch1 and patch2 are closures that record the minimum amount of information necessary to turn the original iris object into iris2 and then iris3. Instead of storing all three objects, which would be memory prohibited (especially on larger data sets), we can instead store iris along with its patches. This is a similar mechanism to many source control systems like Git.

We can now restore iris3 given that we have iris:

print(identical(iris3, patch2(patch1(iris))))

The objectdiff function

In general, any two R objects can be "diffed" against each other using the objectdiff function that this package provides. Good performance will be obtained for atomic vectors, lists (including data.frame's), and recursive combinations of these (e.g., very heterogeneous lists), but for more complex transformations there may not be any gains. For example, diffing a character vector against a function is non-sensical and will produce a patch that completely forgets about the character vector and simply produces the function.

patch <- objectdiff(c(TRUE, FALSE), identity)
print(c(all.equal(patch(1), identity), all.equal(patch(iris), identity),
        all.equal(patch("basically anything becomes identity"), identity)))

In any case, a valid patch will always be produced that is guaranteed to at least produce the same output object given the same input object it was diffed with.



robertzk/objectdiff documentation built on May 27, 2019, 10:35 a.m.