gemini_parallelization: Parallelization in GEMINI

Description Implementation Parallelized processes Caveats Active Development

Description

Notes about parallelization in combinatorial CRISPR analysis

Implementation

To improve efficiency and scalability, GEMINI employs parallelization, enabling a rapid initialization and update routine. Parallelization was implemented using the R pbmclapply package, and specifically through the pbmclapply function. As pbmclapply (and it's parent function, mclapply) relies on forking, parallelization is currently limited to Unix-like (Linux flavors and macOS) machines, but may be extended to other OS in later versions through socket parallelization. In the event that GEMINI is used on a Windows machine, all computations are performed in serial.

Parallelized processes

GEMINI enables parallelization of both initialization and inference. For initialization, parallelization is used to quickly hash the data. For the inference procedure, parallelization is used to speed up the CAVI approach by performing updates independently across cores.

Caveats

To note, there is usually a trade-off in terms of number of cores and shared resources/memory transfer. See inst/figs/gemini-benchmarking.png for a visual depiction.

Also, while most functions in GEMINI have been parallelized, initialization of tau, update of x (group 3), and updates of y are performed in serial. As such, especially in large libraries, tau initialization and x (group 3) updates may take some time. Updating y is usually fast, as the number of genes tends to be the smallest in parameter space. However, these functions may be parallelized in the future as well.

Active Development

Noticed that in some cases, after using multicore processing through pbmclapply, warnings have been produced: "In selectChildren(pids[!fin], -1) : cannot wait for child ... as it does not exist" "In parallel::mccollect(...) : 1 parallel job did not deliver a result" R.version=3.6.0 These warnings, although they appear menacing, are in fact harmless. See: http://r.789695.n4.nabble.com/Strange-error-messages-from-parallel-mcparallel-family-under-3-6-0-td4756875.html#a4756939


gemini documentation built on Nov. 8, 2020, 8:22 p.m.