Here is an example of how wrapr::unpack differs from zeallot::%<-%. Let's take splitting data into 3 parts: train, calibration, and test.

We can do this with wrapr::unpack as follows (using either the <- assignment notation or the := pseudo-assignment notation).

library(wrapr)

d <- data.frame(
  x = 1:100)

d$group <- sample(
  c('train', 'calibration', 'test'), 
  size = nrow(d),
  replace = TRUE, 
  prob = c(0.7, 0.2, 0.1))

unpack[train, calibration, test] <- split(d, d$group)

As unpack works by name we always have (as long as all the groups are non-empty): train$group == "train", calibration$group == "calibration", and 'test$group == "test"`.

unique(train$group)
unique(calibration$group)
unique(test$group)

When one of the sets is empty, unpack catches and reports it with a signaling error.

d$group <- sample(
  c('train', 'calibration', 'test'), 
  size = nrow(d),
  replace = TRUE, 
  prob = c(0.7, 0.2, 0))

unpack[train, calibration, test] <- split(d, d$group)

We can see this behavior is stable.

m <- matrix(0, nrow = 3, ncol = 3)
rownames(m) <- c('train', 'calibration', 'test')
colnames(m) <- c('train', 'calibration', 'test')

for(i in 1:100) {
  d$group <- sample(
  c('train', 'calibration', 'test'), 
  size = nrow(d),
  replace = TRUE, 
  prob = c(0.7, 0.2, 0.1))

  unpack[train, calibration, test] <- split(d, d$group)

  for(nm in c('train', 'calibration', 'test')) {
    found = unique(get(nm)$group)
    m[nm, found] = m[nm, found] + 1
  }
}

print(m)

zeallot, on the other hand, unpacks by position. The first item found is assigned to the first position. Name matching is not enforced.

library(zeallot)

d$group <- sample(
  c('train', 'calibration', 'test'), 
  size = nrow(d),
  replace = TRUE, 
  prob = c(0.7, 0.2, 0.1))

c(train, calibration, test) %<-% split(d, d$group)

Notice the groups were not unpacked into the desired target names.

unique(train$group)
unique(calibration$group)
unique(test$group)

To unpack correctly we have to successfully guess the order of the results of split(). For character vectors this appears to be alphabetic order (likely due to string to factor conversion). However this order can vary.

Here we show getting the order correctly by specifying the order to match our zeallot unpacking (we could also get the unpack to work by writing c(calibration, test, train) %<-% split(d, d$group)).

d$group <- sample(
  factor(c('train', 'calibration', 'test'), 
         levels = c('train', 'calibration', 'test')),
  size = nrow(d),
  replace = TRUE, 
  prob = c(0.7, 0.2, 0.1))

c(train, calibration, test) %<-% split(d, d$group)

The groups are now unpacked into the desired target names.

unique(train$group)
unique(calibration$group)
unique(test$group)

Our concern is: there is no guarantee in R that functions that return named lists always return the fields in the same order (especially as a package our function evolves over time). In Python, where positional unpacking is the standard, functions tend to return tuples, not named lists, so it is guaranteed positions are stable. We feel it is more reliable (and more R-like) to unpack from named lists using names. Some related work (including zeallot and a package the precedes zeallot) can be found here.



WinVector/wrapr documentation built on Aug. 29, 2023, 4:51 a.m.