eachElem-methods: Apply a Function in Parallel over a Set of Lists and Vectors

Description Usage Arguments Details Environment Options Note See Also Examples

Description

eachElem executes function fun multiple times in parallel with a varying set of arguments, and returns the results in a list. It is functionally similar to the standard R lapply function, but is more flexible in the way that the function arguments can be specified.

Usage

1
2
3
  ## S4 method for signature 'sleigh'
eachElem(.Object, fun, elementArgs=list(), fixedArgs=list(), 
      eo=NULL, DEBUG=FALSE)

Arguments

.Object

sleigh class object.

fun

the function to be evaluated by the sleigh. In the case of functions like +, %*%, etc., the function name must be quoted.

elementArgs

list of vectors, lists, matrices, and data frames that specify (some of) the arguments to be passed to fun. Each element should correspond to an argument of fun.

fixedArgs

list of additional arguments to be passed to fun. Each element should correspond to an argument of fun.

eo

list specifying environment options. See the section Environment Options below.

DEBUG

logical; should browser function be called upon entry to eachElem? The default is FALSE.

Details

The eachElem function forms argument sets from objects passed in via elementArgs and fixedArgs. The elements of elementsArgs are used to specify the arguments that are changing, or varying, from task to task, while the elements of fixedArgs are used to specify the arguments that do not vary from task to task. The number of tasks that are executed by a call to eachElem is basically equal to the length of the longest vector (or list, etc) in elementArgs. If any elements of elementArgs are shorter, then their values are recycled, using the standard R rules.

The elements of elementArgs may be vectors, lists, matrices, or data frames. The vectors and lists are always iterated over by element, or "cell", but matrices and data frames can also be iterated over by row or column. This is controlled by the by option, specified via the eo argument. See below for more information.

For example:

eachElem(s, '+', elementArgs=list(1:4), fixedArgs=list(100))

This will submit four tasks, since the length of 1:4 is four. The four tasks will be to add the arguments 1 and 100, 2 and 100, 3 and 100, and 4 and 100. The result is a list containing the four values 101, 102, 103, and 104.

Another way to do the same thing is with:

eachElem(s, '+', elementArgs=list(1:4, 100))

Since the second element of elementArgs is length one, it's value is recycled four times, thus specifying the same set of tasks as in the previous example. This method also has the advantage of making it easy to put fixed values before varying values, without the need for the eo$argPermute option, discussed later. For example:

eachElem(s, '-', elementArgs=list(100, 1:4))

is similar to the R statement:

100 - 1:4

Note that in simple examples like these, where the results are numeric values, the standard R unlist function can be very useful for converting the resulting list into a vector.

Environment Options

The eo argument is a list that can be used to specify various options. The following options are recognized:

elementFunc

The eo$elementFunc option can be used to specify a callback function that provides the varying arguments for fun in place of elementArgs (that is, you can't specify both eo$elementFunc and elementArgs). eachElem calls the eo$elementFunc function to get a list of arguments for one invocation of fun, and will keep calling it until eo$elementFunc signals that there are no more tasks to execute by calling the stop function with no arguments. eachElem appends any values specified by fixedArgs to the list returned by eo$elementFunc just as if elementArgs had been specified.

eachElem passes the number of the desired task (starting from 1) as the first argument to eo$elementFunc, and the value of the eo$by option as the second argument. Note that the use of the eo$elementFunc function is an advanced feature, but is very useful when executing a large number of tasks, or when the arguments are coming from a database query, for example. For that reason, the eo$loadFactor option should usually be used in conjunction with eo$elementFunc (see description below).

accumulator

The eo$accumulator option can be used to specify a callback function that will receive the results of the task execution as soon as they are complete, rather than returning all of the task results as a list when eachElem completes. In other words, eachElem will call the eo$accumulator function with task results as soon as it receives them from the sleigh workers, rather than saving them in memory until all the tasks are complete. Note that if the tasks are chunked (using the eo$chunkSize option described below), then the eo$accumulator function will receive multiple task results, which is why the task results are always passed to the eo$accumulator function in a list.

The first argument to the eo$accumulator function is a list of results, where the length of the list is equal to eo$chunkSize. The second argument is a vector of task numbers, starting from 1, where the length of the vector is also equal to eo$chunkSize. The task numbers are very important, because the results are not guaranteed to be returned in order. eo$accumulator is another advanced feature, and like eo$elementFunc, is very useful when executing a large number of tasks. It allows you to process each result as they finish, rather than forcing you to wait until all of the tasks are complete. In conjunction with eo$elementFunc and eo$loadFactor, you can set up a pipeline, allowing you to process an unlimited number of tasks efficiently. Note that when eo$accumulator is specified, eachElem returns NULL, not the list of results, since eachElem doesn't save any of the results after passing them to the eo$accumulator function.

by

The eo$by option specifies the iteration scheme to use for matrix and data frame elements in elementArgs. The default value is "row", but it can also be set to "column" or "cell". Vectors and lists in elementArgs are not affected by this option.

chunkSize

The eo$chunkSize option is a tuning parameter that specifies the number of tasks that sleigh workers should allocate at a time. The default value is 1, but if the tasks are small, performance can be improved by specifying a larger value, which decreases the overhead per task.

If the fun function executes very quickly, you may not be able to keep your workers busy, giving you poor performance. In that case, consider setting the eo$chunkSize option to a large enough number to increase the effective task execution time.

loadFactor

The eo$loadFactor option is a tuning parameter that specifies the maximum number of tasks per worker that are submitted to the sleigh at the same time. If set, no more than (loadFactor * workerCount) tasks will be submitted at the same time. This helps to control the resource demands that are made on the NetWorkSpaces server, which is especially important if there are a large number of tasks. Note that this option is ignored if blocking is set to TRUE, since the two options are incompatible with each other.

If in doubt, set the eo$loadFactor option to 10. That will almost certainly avoid putting a strain on the NetWorkSpaces server, and if that isn't enough to keep your workers busy, then you should really be using the eo$chunkSize option to give the workers more to do.

blocking

The eo$blocking option is used to indicate whether to wait for the results, or to return as soon as the tasks have been submitted. If set to FALSE, eachElem will return a sleighPending object that is used to monitor the status of the tasks, and to eventually retrieve the results. You must wait for the results to be complete before executing any further tasks on the sleigh, or an exception will be raised. The default value is TRUE.

argPermute

The eo$argPermute option is used to reorder the arguments passed to fun. It is generally only useful if the fixedArgs argument has been specified, and some of those arguments need to precede the arguments specified via elementArgs. Note that by using recycling of elements in elementArgs, the use of fixedArgs and argPermute can often be avoided entirely.

Note

If elementArgs or fixedArgs isn't a list, eachElem will automatically wrap it in a list. This is a convenience that only works for passing in a single vector and matrix, however.

If elementArgs or fixedArgs are named lists, then the names are used to map the values to the appropriate argument of fun. This can be used as another technique to avoid the use of eo$argPermute.

The elementArgs argument can be specified as a data frame. This works just like a named list, and therefore, the column names of the data frame must all correspond to arguments of fun. Note that if the data frame has many rows, the performance may not be good due to the overhead of subsetting data frames in R.

If you have a huge number of tasks, consider using the eo$elementFunc, eo$accumulator, and eo$loadFactor options.

If eo$elementFunc returns a value that isn't a list, eachElem will automatically wrap that value in a list.

The eo$elementFunc function doesn't have to define a second formal argument (the by argument) if it's not needed.

The eo$accumulator function doesn't have to define a second formal argument (the taskVector argument) if it's not needed. Just remember that the results are not guaranteed to come back in order.

See Also

eachWorker, sleighPending

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
  ## Not run: 
# create a sleigh
s <- sleigh()

# compute the list mean for each list element
x <- list(a=1:10, beta=exp(-3:3), logic=c(TRUE,FALSE,FALSE,TRUE))
eachElem(s, mean, list(x))

# median and quartiles for each list element
eachElem(s, quantile, elementArgs=list(x), fixedArgs=list(probs=1:3/4))

# use eo$elementFunc to supply 100 random values and eo$accumulator to
# receive the results
elementFunc <- function(i, by) {
  if (i <= 100) list(i=i, x=runif(1)) else stop()
}
accumulator <- function(resultList, taskVector) {
  if (resultList[[1]][[1]] != taskVector[1]) stop('assertion failure')
  cat(paste(resultList[[1]], collapse=' '), '\n')
}
eo <- list(elementFunc=elementFunc, accumulator=accumulator)
eachElem(s, function(i, x) list(i=i, x=x, xsq=x*x), eo=eo)
  
## End(Not run)

nws documentation built on May 2, 2019, 8:51 a.m.