async_work: Execute parallel job in another session
In dipterix/ravebase: Base Package for RAVE

Description Usage Arguments Details Value Examples

Similar to lapply but run in parallel

async_work(
  X,
  FUN,
  ...,
  .globals = NULL,
  .name = "Untitled",
  .rs = FALSE,
  .wait = TRUE,
  .chunk_size = Inf
)

`X`	vector
`FUN`	R function
`...`	further arguments to `FUN`
`.globals`	named list of global variables to be used by `FUN`
`.name`	job or progress name
`.rs`	whether to use `'RStudio'` job scheduler
`.wait`	whether to wait for the results
`.chunk_size`	maximum chunk size per job, must be `Inf` if `.wait` is false

Unlike future package functions, where the global variables can be automatically determined, you must specify the variables to be used by FUN. In addition, you may only assume base packages are loaded when executing functions. Therefore it's recommended to call functions with package names like utils::read.csv explicitly instead of read.csv etc. See examples for details.

The main feature of async_work is that there is no backward communication between main and slave process, hence the setup time is faster than future multiprocess. There is no memory leak issue caused by forked process, hence it's designed for process that writes something to disk and doesn't require too much feed-backs. However, using this function requires to specify .globals, which is inconvenient for beginners.

If .wait is true, then return list of results of FUN being applied to each element of X, otherwise returns a function that can be used to track and obtain the results.

if(interactive()){
  a <- 1
  f <- function(x, b){
    Sys.sleep(1)
    list(
      result = x + a + b,
      loaded = names(utils::sessionInfo()$loaded),
      attached = search()
    )
  }

  # `a` is a "global" variable because `f` must need to look up for its
  # declaring environment, hence must be specified in `.globals`
  #
  res <- async_work(1:10, f, b = 3, .globals = list(a = a))

  # Only base libraries are attached
  res[[1]]
}