The vast majority of problems tackled by RMINC as of 2016 are considered ["embarassingly parallel"](https://en.wikipedia.org/wiki/Embarrassingly_parallel) this means that the problem can be broken into smaller pieces that can be solved completely independently. Take for example the standard massively-univariate approach to analyzing neuroimages. Each image is composed of voxels, and a separate model is computed at each voxel. These models are computed without any dependency on the models being computed at other voxels. Since there is no interdependency it is easy to split the voxels up between multiple cores on a computer, or even between different computers and combine the results after. ## A Basic Example Imagine for example there are some simple images that are 10 x 10 x 10 voxels. wzxhzdk:0 Now to see if the values at each voxel depend on some variable, body weight for example, we can fit a linear model. Since the models are independent we can divide the problem up. I'll use four cores on my computer, each one will fit 250 of the 1000 models, and then the results can be stuck back together. wzxhzdk:1 Because the job is divided between 4 cores the computation goes much faster. There is some overhead to splitting and recombining the job, so you won't get a 4x speedup, but it will be faster. The argument `parallel = c("local", 4)` says to run the jobs on 4 cores of the local machine. This system is even more powerful if you have a cluster at your disposal. You can divide the problem in to potentially 100s of peices that can be solved independently. ### Parallel Functions As of Oct. 20th, 2016, The following functions support the parallel option as used above: 1. mincLm 2. pMincApply 3. mincLmer 4. anatGetAll 5. anatLmer 6. vertexApply 7. vertexLmer Each of these can run sequentially, locally (multicore), or on a cluster. ## RMINC on a Cluster Using RMINC on a cluster is now relatively simple. Thanks to the wonderful ["batchtools"](https://github.com/mllg/batchtools) package RMINC can be configured to run on Torque, SGE, Slurm, ad-hoc SSH clusters and more. We and our collaborators typically use Torque and SGE but using these other systems is possible with a little configuration effort. ### Quick Start To setup RMINC to run on your cluster, you will typically set a default configuration file in a bash session before opening R. wzxhzdk:2 This file will likely look something like: wzxhzdk:3 In fact this is the default configuration that comes with RMINC, it assumes you are using a torque cluster and each job will take under an hour, using less than 8 gigabytes of memory and using a single node. Then to run a job you can do: wzxhzdk:4 As of RMINC 1.4.3 there is no configuration necessary within parallel function calls other than to specify the number of batches. The argument `parallel = c("pbs", 6)` is splitting the job into 6 peices. The `"pbs"` portion is checked to see if it is "local" or "snowfall", otherwise it is ignored, relying instead on batchtools configuration to control execution. This contrasts from earlier versions of RMINC. ### Site Configuration To make it so you do not need to configure batchtools each time you open an R session you can add a line that looks like: wzxhzdk:5 To your bashrc. The file pointed to by `RMINC_BATCH_CONF` is R code that sets the `cluster.functions` and `default.resources` for batchtools. The function `makeClusterFunctionsTORQUE` finds a template file that it fills in whenever a new job is created. The example PBS template looks like: wzxhzdk:6 Batchtools handles the job.name, log.file, and rscript portions of the template. Our default.resources configuration provides the nodes, walltime, and memory. If you do not use PBS or SGE, you can find example configurations on [batchtools' github](https://github.com/mllg/batchtools/tree/master/inst/templates) If in the middle of a session you need to change you configuration you can pass a `resources` argument explicitly to most parallel functions, or pass in an alternative configuration file with the `conf_file` argument. wzxhzdk:7 ## Internals For Intrepid Developers The basics of parallel system is to dispatch one of three commands, the basic sequential form, the local form, and the queued form. Within the parallel forms the first task is to divide the problem into pieces. We'll look at vertexApply as a simple case study wzxhzdk:8 First the filenames are converted into a matrix of values with each column representing a subject and each row representing a vertex. This is shipped off to `matrixApply` an internal alternative to `apply` that can parallelize both locally and on a queue. wzxhzdk:9 We'll break it down section by section: The first block handles numeric and file name masks. This is used to subset the rows to save computation time wzxhzdk:10 match.fun ensures that the function passed in is a properly bound function. wzxhzdk:11 The next block is the engine of `matrixApply`, its a function that applies the user supplied function to each row wzxhzdk:12 Now we handle dispatch wzxhzdk:13 If parallel is NULL, we just pass the whole matrix into `apply_fun`, otherwise we handle the two parallel cases. First the number of parallel jobs the user wants is retrieved. Then the row indices are split into groups using split and RMINC's internal `groupingVector` function. The split returns a list of indices, one per group. Then if the first portion of parallel is "local", a muffled version of `parallel::mclapply` is called, split the job up amongst cores. This creates a list of results which are then merged via `Reduce` and cbind. The BatchJobs case follows the same logic, although a few more infrastructure functions are required. A BatchJobs registry is created with `makeRegistry` and registered for removal when the function ends. `tenacious_remove_registry` works hard to ensure all evidence of a failed run is scrubbed from the system. The triplet of `batchMap`, `submitJobs`, and `waitForJobs` instructs the queue to start running the jobs and wait patiently for them to finish. The results are then loaded with `reduceResults` and reduced as above. After this there is a little result coercion wzxhzdk:14 The results are transposed to give results that resemble mincApply, if a mask was used, the results are inflated back to their original dimensions, and if the result has only a single column it is returned to a vector form. Then the results are returned. This bubbles back up to vertexApply which tacks on a likeFile attribute and returns. Implementing additional parallel functions will follow this general framework: 1. Create an engine function that does the work 2. Check if `parallel` is NULL, if so dispatch the engine function as is 3. Determine how to split the job up into peices that the engine can run on, create a list or vector with the appropriate information 4. If `parallel[1]` is "local" run mclapply or its muffled variant on the engine function and the grouping object 5. Otherwise run the batchtools commands above 6. Reduce the results to an appropriate format. Although it is possible `matrixApply` will work for your problem (potentially with a little transposition). If not `mincLm` is a little harder to read, but has some examples for how to use a mask to parallelize jobs.


Mouse-Imaging-Centre/RMINC documentation built on Nov. 12, 2022, 1:50 p.m.