Description Usage Arguments Details Environment Options Note See Also Examples
eachElem
executes function fun
multiple times in
parallel with a varying set of arguments, and returns the results in a
list. It is functionally similar to the standard R
lapply
function, but is more flexible in the way that
the function arguments can be specified.
1 2 3 |
.Object |
sleigh class object. |
fun |
the function to be evaluated by the sleigh.
In the case of functions like |
elementArgs |
list of vectors, lists, matrices, and data frames that
specify (some of) the arguments to be passed to |
fixedArgs |
list of additional arguments to be passed to |
eo |
list specifying environment options. See the section Environment Options below. |
DEBUG |
logical; should |
The eachElem
function forms argument sets from objects passed in via
elementArgs
and fixedArgs
.
The elements of elementsArgs
are used to specify the arguments that are
changing, or varying, from task to task, while the elements of
fixedArgs
are used to specify the arguments that do not vary
from task to task. The number of tasks that are executed by a call to
eachElem
is basically equal to the length of the longest vector
(or list, etc) in elementArgs
. If any elements of
elementArgs
are shorter, then their values are recycled, using
the standard R rules.
The elements of elementArgs
may be vectors, lists, matrices, or
data frames. The vectors and lists are always iterated over by
element, or "cell"
, but matrices and data frames can also be iterated
over by row or column. This is controlled by the by
option,
specified via the eo
argument. See below for more information.
For example:
eachElem(s, '+', elementArgs=list(1:4), fixedArgs=list(100))
This will submit four tasks, since the length of 1:4 is four. The four tasks will be to add the arguments 1 and 100, 2 and 100, 3 and 100, and 4 and 100. The result is a list containing the four values 101, 102, 103, and 104.
Another way to do the same thing is with:
eachElem(s, '+', elementArgs=list(1:4, 100))
Since the second element of elementArgs
is length one, it's
value is recycled four times, thus specifying the same set of tasks as
in the previous example. This method also has the advantage of making it
easy to put fixed values before varying values, without the need for
the eo$argPermute
option, discussed later. For example:
eachElem(s, '-', elementArgs=list(100, 1:4))
is similar to the R statement:
100 - 1:4
Note that in simple examples like these, where the results are numeric
values, the standard R unlist
function can be very
useful for converting the resulting list into a vector.
The eo
argument is a list that can be used to specify various
options. The following options are recognized:
The eo$elementFunc
option can be used to specify a callback function that provides the varying arguments for fun
in place
of elementArgs
(that is, you can't specify both
eo$elementFunc
and elementArgs
). eachElem
calls
the eo$elementFunc
function to get a list of arguments for one
invocation of fun
, and will keep calling it until
eo$elementFunc
signals that there are no more tasks to execute
by calling the stop
function with no arguments.
eachElem
appends any values specified by fixedArgs
to
the list returned by eo$elementFunc
just as if
elementArgs
had been specified.
eachElem
passes the number of the desired task (starting from
1) as the first argument to eo$elementFunc
, and the value of
the eo$by
option as the second argument. Note that the use of
the eo$elementFunc
function is an advanced feature, but is very
useful when executing a large number of tasks, or when the arguments
are coming from a database query, for example. For that reason, the
eo$loadFactor
option should usually be used in conjunction with
eo$elementFunc
(see description below).
The eo$accumulator
option can be used to specify a callback
function that will receive the results of the task execution as soon
as they are complete, rather than returning all of the task results as
a list when eachElem
completes. In other words,
eachElem
will call the eo$accumulator
function with task
results as soon as it receives them from the sleigh workers, rather
than saving them in memory until all the tasks are complete. Note
that if the tasks are chunked (using the eo$chunkSize
option
described below), then the eo$accumulator
function will receive
multiple task results, which is why the task results are always passed
to the eo$accumulator
function in a list.
The first argument to the eo$accumulator
function is a list of
results, where the length of the list is equal to eo$chunkSize
.
The second argument is a vector of task numbers, starting from 1,
where the length of the vector is also equal to eo$chunkSize
.
The task numbers are very important, because the results are not
guaranteed to be returned in order. eo$accumulator
is another
advanced feature, and like eo$elementFunc
, is very useful when
executing a large number of tasks. It allows you to process each
result as they finish, rather than forcing you to wait until all of
the tasks are complete. In conjunction with eo$elementFunc
and
eo$loadFactor
, you can set up a pipeline, allowing you to
process an unlimited number of tasks efficiently. Note that when
eo$accumulator
is specified, eachElem
returns NULL, not
the list of results, since eachElem
doesn't save any of the
results after passing them to the eo$accumulator
function.
The eo$by
option specifies the iteration scheme to use for matrix
and data frame elements in elementArgs
. The default value is
"row"
, but it can also be set to "column"
or "cell"
. Vectors and
lists in elementArgs
are not affected by this option.
The eo$chunkSize
option is a tuning parameter that
specifies the number of tasks that sleigh workers should allocate at a
time. The default value is 1, but if the tasks are small, performance
can be improved by specifying a larger value, which decreases the
overhead per task.
If the fun
function executes very quickly, you may not be able
to keep your workers busy, giving you poor performance. In that case,
consider setting the eo$chunkSize
option to a large enough
number to increase the effective task execution time.
The eo$loadFactor
option is a tuning parameter that specifies
the maximum number of tasks per worker that are submitted to the
sleigh at the same time. If set, no more than (loadFactor *
workerCount)
tasks will be submitted at the same time. This helps to
control the resource demands that are made on the NetWorkSpaces
server, which is especially important if there are a large number of
tasks. Note that this option is ignored if blocking
is set to
TRUE
, since the two options are incompatible with each other.
If in doubt, set the eo$loadFactor
option to 10. That will
almost certainly avoid putting a strain on the NetWorkSpaces server, and
if that isn't enough to keep your workers busy, then you should
really be using the eo$chunkSize
option to give the workers
more to do.
The eo$blocking
option is used to indicate whether to wait for the
results, or to return as soon as the tasks have been submitted. If
set to FALSE
, eachElem
will return a sleighPending
object that is used to monitor the status of the tasks, and to
eventually retrieve the results. You must wait for the results to be
complete before executing any further tasks on the sleigh, or an
exception will be raised. The default value is TRUE
.
The eo$argPermute
option is used to reorder the arguments passed
to fun
. It is generally only useful if the fixedArgs
argument has been specified, and some of those arguments need to
precede the arguments specified via elementArgs
. Note that by
using recycling of elements in elementArgs
, the use of
fixedArgs
and argPermute
can often be avoided entirely.
If elementArgs
or fixedArgs
isn't a list,
eachElem
will automatically wrap it in a list. This is a
convenience that only works for passing in a single vector and matrix,
however.
If elementArgs
or fixedArgs
are named lists, then the
names are used to map the values to the appropriate argument of
fun
. This can be used as another technique to avoid the use of
eo$argPermute
.
The elementArgs
argument can be specified as a data frame.
This works just like a named list, and therefore, the column names of
the data frame must all correspond to arguments of fun
. Note
that if the data frame has many rows, the performance may not be good
due to the overhead of subsetting data frames in R.
If you have a huge number of tasks, consider using the
eo$elementFunc
, eo$accumulator
, and eo$loadFactor
options.
If eo$elementFunc
returns a value that isn't a list,
eachElem
will automatically wrap that value in a list.
The eo$elementFunc
function doesn't have to define a second
formal argument (the by
argument) if it's not needed.
The eo$accumulator
function doesn't have to define a second
formal argument (the taskVector
argument) if it's not needed.
Just remember that the results are not guaranteed to come back in
order.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | ## Not run:
# create a sleigh
s <- sleigh()
# compute the list mean for each list element
x <- list(a=1:10, beta=exp(-3:3), logic=c(TRUE,FALSE,FALSE,TRUE))
eachElem(s, mean, list(x))
# median and quartiles for each list element
eachElem(s, quantile, elementArgs=list(x), fixedArgs=list(probs=1:3/4))
# use eo$elementFunc to supply 100 random values and eo$accumulator to
# receive the results
elementFunc <- function(i, by) {
if (i <= 100) list(i=i, x=runif(1)) else stop()
}
accumulator <- function(resultList, taskVector) {
if (resultList[[1]][[1]] != taskVector[1]) stop('assertion failure')
cat(paste(resultList[[1]], collapse=' '), '\n')
}
eo <- list(elementFunc=elementFunc, accumulator=accumulator)
eachElem(s, function(i, x) list(i=i, x=x, xsq=x*x), eo=eo)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.