Tidy distributed grid search in R and Sun/Open Grid Engine
devtools::install_github("patr1ckm/distributr")
The basic function is grid_apply
, which applies a function over a grid of its arguments expand.grid(...)
, returning results in a list. Function applications can be executed repeatedly and in parallel.
```{r, eval=TRUE} do.one <- function(n, mu, sd){ mean(rnorm(n, mu, sd)) }
sim <- grid_apply(do.one, n = c(50, 100, 500), mu = c(1,5), sd = c(1, 5, 10), .reps=50, .mc.cores=5)
[[1]] [1] 1.053669
[[2]] [1] 1.244468
[[3]] [1] 1.267939
[[4]] [1] 1.157546
[[5]] [1] 0.786027
The arguments to grid over must be scalar. Other arguments (such as data) can be passed as a list to `.args`.
## tidy
A tidy method is provided that merges the list of results with the argument grid, putting the results in tidy form. This format is convenient for plotting and further data analysis. `tidy` works with lists of vectors, lists, and data frames.
The function `gapply` runs `grid_apply` followed by `tidy`.
```{r, eval=TRUE}
res <- sim(tidy)
res <- gapply(do.one, n = c(50, 100, 500), mu = c(1,5), sd = c(1, 5, 10),
.reps=50, .mc.cores=5)
n mu sd .rep value
1 50 1 1 1 0.9476228
2 50 1 1 2 0.7545730
3 50 1 1 3 0.9154810
4 50 1 1 4 1.0704074
5 50 1 1 5 0.9840148
6 50 1 1 6 1.1933439
If results are of varying length, it can be helpful to stack them into key, value
pairs
with tidy(., stack=TRUE)
.
gapply
captures both warnings and errors. These can be accessed very simply:
```{r, eval=TRUE} err(sim) warn(sim)
## Sun/Open Grid Engine
A compute plan can be setup and executed using the Sun/Open Grid Engine scheduler. Rows of the argument grid are submitted to nodes, and replications are carried out in parallel via `mclapply`.
```{r}
sim <- gapply(do.one, n = c(50, 100, 500), mu = c(1,5), sd = c(1, 5, 10), .eval=F)
sim <- setup(sim, .reps=500, .mc.cores = 5)
submit(sim)
res <- tidy(collect())
The setup
function asks for user confirmation if an existing argument grid would be overwritten.
Jobs can be added to the compute plan via add_jobs
. A set of jobs can be selected from the argument grid using filter_jobs
and the usual dplyr syntax to filter
.
jobs(sim) # access jobs grid (argument grid)
add_jobs(sim, n=1000, mu=10, sd=50) # add jobs to plan
filter_jobs(sim, n < 100, .mc.cores=5) # filter jobs as in dplyr
collect(filter="n < 100") # collect results from jobs matching filter
More information is available on the wiki, for example, illustrating how chunking, random number control, and caching can all be done transparently via do.one
. These slides give a general overview.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.