Multiple Assignment

wrapr now supplies a name based multiple assignment notation for R.

In R there are many functions that return named lists or other structures keyed by names. Let's start with a simple example: base::split().

First some example data.

d <- data.frame(
  x = 1:9,
  group = c('train', 'calibrate', 'test'),
  stringsAsFactors = FALSE)

knitr::kable(d)

One way to use base::split() is to call it on a data.frame and then unpack the desired portions from the returned value.

parts <- split(d, d$group)
train_data <- parts$train
calibrate_data <- parts$calibrate
test_data <- parts$test
knitr::kable(train_data)

knitr::kable(calibrate_data)

knitr::kable(test_data)

If we use a multiple assignment notation we can collect some steps together, and avoid possibly leaving a possibly large temporary variable such as parts in our environment.

Let's clear out our earlier results.

rm(list = c('train_data', 'calibrate_data', 'test_data', 'parts'))

And now let's apply split() and unpack the results in one step.

library(wrapr)

to[
  train_data <- train,
  calibrate_data <- calibrate,
  test_data <- test
  ] <- split(d, d$group)
knitr::kable(train_data)

knitr::kable(calibrate_data)

knitr::kable(test_data)

The semantics of []<- imply that an object named "to" is left in our workspace as a side effect. However, this object is small and if there is already an object name to in the workspace that is not of class Unpacker the unpacking is aborted prior to overwriting anything. The unpacker two modes: unpack (a function that needs a dot in pipes) and to (an eager function factory that does not require a dot in pipes). The side-effect can be avoided by using := for assigment.

rm(list = c('train_data', 'calibrate_data', 'test_data', 'to'))

to[
  train_data <- train,
  calibrate_data <- calibrate,
  test_data <- test
  ] := split(d, d$group)

ls()

Also the side-effect can be avoided by using alternate non-array update notations.

We will demonstrate a few of these. First is pipe to array notation.

rm(list = c('train_data', 'calibrate_data', 'test_data'))
split(d, d$group) %.>% to[
  train_data <- train,
  calibrate_data <- calibrate,
  test_data <- test
  ]

ls()

Note the above is the wrapr dot arrow pipe (which requires explicit dots to denote pipe targets). In this case it is dispatching on the class of the right-hand side argument to get the effect. This is a common feature of the wrapr dot arrow pipe. We could get a similar effect by using right-assigment "->" instead of the pipe.

We can also use a pipe function notation.

rm(list = c('train_data', 'calibrate_data', 'test_data'))
split(d, d$group) %.>% to(
  train_data <- train,
  calibrate_data <- calibrate,
  test_data <- test
)

ls()

Notice piping to to() is like piping to to[], no dot is needed.

We can not currently use the magrittr pipe in the above as in that case the unpacked results are lost in a temporary intermediate environment magrittr uses during execution.

A more conventional functional form is given in unpack(). unpack() requires a dot in wrapr pipelines.

rm(list = c('train_data', 'calibrate_data', 'test_data'))
split(d, d$group) %.>% unpack(
  .,
  train_data <- train,
  calibrate_data <- calibrate,
  test_data <- test
)

ls()

Unpack also support the pipe to array and assign to array notations. In addition, with unpack() we could also use the conventional function notation.

rm(list = c('train_data', 'calibrate_data', 'test_data'))
unpack(
  split(d, d$group),
  train_data <- train,
  calibrate_data <- calibrate,
  test_data <- test
)

ls()

to() can not be directly used as a function. It is strongly suggested that the objects returned by to[], to(), and unpack[] not ever be stored in variables, but instead only produced, used, and discarded. The issue these are objects of class "UnpackTarget" and have the upack destination names already bound in. This means if one of these is used in code: a user reading the code can not tell where the side-effects are going without examining the contents of the object.

The assignments in the unpacking block can be any of <-, =, :=, or even -> (though the last one assigns left to right).

rm(list = c('train_data', 'calibrate_data', 'test_data'))
unpack(
  split(d, d$group),
  train_data = train,
  calibrate_data = calibrate,
  test_data = test
)

ls()
rm(list = c('train_data', 'calibrate_data', 'test_data'))
unpack(
  split(d, d$group),
  train -> train_data,
  calibrate -> calibrate_data,
  test -> test_data
)

ls()

It is a caught and signaled error to attempt to unpack an item that is not there.

rm(list = c('train_data', 'calibrate_data', 'test_data'))
unpack(
  split(d, d$group),
  train_data <- train,
  calibrate_data <- calibrate_misspelled,
  test_data <- test
)
ls()

The unpack attempts to be atomic: preferring to unpack all values or no values.

Also, one does not have to unpack all slots.

unpack(
  split(d, d$group),
  train_data <- train,
  test_data <- test
)

ls()

We can use a name alone as shorthand for name <- name (i.e. unpacking to the same name as in the incoming object).

rm(list = c('train_data', 'test_data'))
split(d, d$group) %.>%
  to[
     train,
     test
     ]

ls()

We can also use bquote .() notation to use variables to specify where data is coming from.

rm(list = c('train', 'test'))
train_source <- 'train'

split(d, d$group) %.>%
  to[
     train_result = .(train_source),
     test
     ]

ls()

In all cases the user explicitly documents the intended data sources and data destinations at the place of assignment. This meas a later reader of the source code can see what the operation does, without having to know values of additional variables.

Related work includes:



Try the wrapr package in your browser

Any scripts or data that you put into this service are public.

wrapr documentation built on Aug. 20, 2023, 1:08 a.m.