Here is an example of how wrapr::unpack
differs from zeallot::%<-%
. Let's take splitting data into 3 parts: train
, calibration
, and test
.
We can do this with wrapr::unpack
as follows (using either the <-
assignment notation or the :=
pseudo-assignment notation).
library(wrapr) d <- data.frame( x = 1:100) d$group <- sample( c('train', 'calibration', 'test'), size = nrow(d), replace = TRUE, prob = c(0.7, 0.2, 0.1)) unpack[train, calibration, test] <- split(d, d$group)
As unpack
works by name we always have (as long as all the groups are non-empty): train$group == "train"
, calibration$group == "calibration"
, and 'test$group == "test"`.
unique(train$group)
unique(calibration$group)
unique(test$group)
When one of the sets is empty, unpack
catches and reports it with a signaling error.
d$group <- sample( c('train', 'calibration', 'test'), size = nrow(d), replace = TRUE, prob = c(0.7, 0.2, 0)) unpack[train, calibration, test] <- split(d, d$group)
We can see this behavior is stable.
m <- matrix(0, nrow = 3, ncol = 3) rownames(m) <- c('train', 'calibration', 'test') colnames(m) <- c('train', 'calibration', 'test') for(i in 1:100) { d$group <- sample( c('train', 'calibration', 'test'), size = nrow(d), replace = TRUE, prob = c(0.7, 0.2, 0.1)) unpack[train, calibration, test] <- split(d, d$group) for(nm in c('train', 'calibration', 'test')) { found = unique(get(nm)$group) m[nm, found] = m[nm, found] + 1 } } print(m)
zeallot
, on the other hand, unpacks by position. The first item found is assigned to the first position. Name matching is not enforced.
library(zeallot) d$group <- sample( c('train', 'calibration', 'test'), size = nrow(d), replace = TRUE, prob = c(0.7, 0.2, 0.1)) c(train, calibration, test) %<-% split(d, d$group)
Notice the groups were not unpacked into the desired target names.
unique(train$group)
unique(calibration$group)
unique(test$group)
To unpack correctly we have to successfully guess the order of the results of split()
. For character vectors this appears to be
alphabetic order (likely due to string to factor conversion). However this order can vary.
Here we show getting the order correctly by specifying the order to match our zeallot
unpacking (we could also get the unpack to work
by writing c(calibration, test, train) %<-% split(d, d$group)
).
d$group <- sample( factor(c('train', 'calibration', 'test'), levels = c('train', 'calibration', 'test')), size = nrow(d), replace = TRUE, prob = c(0.7, 0.2, 0.1)) c(train, calibration, test) %<-% split(d, d$group)
The groups are now unpacked into the desired target names.
unique(train$group)
unique(calibration$group)
unique(test$group)
Our concern is: there is no guarantee in R
that functions that return named lists always return the fields in the same order (especially as a package our function evolves over time). In Python
, where positional unpacking is the standard, functions tend to return tuples
, not named lists, so it is guaranteed positions are stable. We feel it is more reliable (and more R
-like) to unpack from named lists using names. Some related work (including zeallot
and a package the precedes zeallot
) can be found here.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.