chunk | R Documentation |
Jobs can be partitioned into “chunks” to be executed sequentially on the computational nodes.
Chunks are defined by providing a data frame with columns “job.id” and “chunk” (integer)
to submitJobs
.
All jobs with the same chunk number will be grouped together on one node to form a single
computational job.
The function chunk
simply splits x
into either a fixed number of groups, or
into a variable number of groups with a fixed number of maximum elements.
The function lpt
also groups x
into a fixed number of chunks,
but uses the actual values of x
in a greedy “Longest Processing Time” algorithm.
As a result, the maximum sum of elements in minimized.
binpack
splits x
into a variable number of groups whose sum of elements do
not exceed the upper limit provided by chunk.size
.
See examples of estimateRuntimes
for an application of binpack
and lpt
.
chunk(x, n.chunks = NULL, chunk.size = NULL, shuffle = TRUE)
lpt(x, n.chunks = 1L)
binpack(x, chunk.size = max(x))
x |
[ |
n.chunks |
[ |
chunk.size |
[ |
shuffle |
[ |
[integer
] giving the chunk number for each element of x
.
estimateRuntimes
ch = chunk(1:10, n.chunks = 2)
table(ch)
ch = chunk(rep(1, 10), chunk.size = 2)
table(ch)
set.seed(1)
x = runif(10)
ch = lpt(x, n.chunks = 2)
sapply(split(x, ch), sum)
set.seed(1)
x = runif(10)
ch = binpack(x, 1)
sapply(split(x, ch), sum)
# Job chunking
tmp = makeRegistry(file.dir = NA, make.default = FALSE)
ids = batchMap(identity, 1:25, reg = tmp)
### Group into chunks with 10 jobs each
library(data.table)
ids[, chunk := chunk(job.id, chunk.size = 10)]
print(ids[, .N, by = chunk])
### Group into 4 chunks
ids[, chunk := chunk(job.id, n.chunks = 4)]
print(ids[, .N, by = chunk])
### Submit to batch system
submitJobs(ids = ids, reg = tmp)
# Grouped chunking
tmp = makeExperimentRegistry(file.dir = NA, make.default = FALSE)
prob = addProblem(reg = tmp, "prob1", data = iris, fun = function(job, data) nrow(data))
prob = addProblem(reg = tmp, "prob2", data = Titanic, fun = function(job, data) nrow(data))
algo = addAlgorithm(reg = tmp, "algo", fun = function(job, data, instance, i, ...) problem)
prob.designs = list(prob1 = data.table(), prob2 = data.table(x = 1:2))
algo.designs = list(algo = data.table(i = 1:3))
addExperiments(prob.designs, algo.designs, repls = 3, reg = tmp)
### Group into chunks of 5 jobs, but do not put multiple problems into the same chunk
# -> only one problem has to be loaded per chunk, and only once because it is cached
ids = getJobTable(reg = tmp)[, .(job.id, problem, algorithm)]
ids[, chunk := chunk(job.id, chunk.size = 5), by = "problem"]
ids[, chunk := .GRP, by = c("problem", "chunk")]
dcast(ids, chunk ~ problem)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.