README.md

This helper script allows code to be portable between use on a MPI cluster computer and allows the use of the same code for parallel functions on Linux, Mac OS, and Windows.

Linux and Mac OS are POSIX compliant and therefore are able to make use of 'fork()' to enable parallel tasks with significant speed benefits and ease of coding, this feature however does not exist in Windows. Instead, Windows machines can make use of a PSOCK cluster to do parallel computing. The commands used for a PSOCK cluster are similar to those used for an MPI cluster, but differ in syntax and occasional arguments. Thus we have 3 ways to do parallel computing, with different syntax being used when on a Linux or Mac OS machine vs a Windows machine or while utilizing an MPI cluster. This helper script unifies the syntax and allows the code to be portable and suitably parallel across all platforms.

Requires:

The Rhpc package is necessary to run the code with MPI, which is used in place of the standard Rmpi and snow packages as it suports long vectors. The version patched by my is required, but CRAN maintains the official Rhpc CRAN Version, which contains documentaion and links to the maintainer. The remaining functionality is provided by the parallel package, which is shipped with R by default.

Please take a look through the R high performance computing page to get an overview of all the possible ways to use high performance computing with R. Some of the functionality of these packages may be incorporated at a later date...if I find myself using them regularly.

Installing

To install the package, you'll need the devtools package developed by Hadley Wickham.

install.packages("devtools")

Next, install the package directly from GitHub

library("devtools")
install_github("bamonroe/ctools")

And you should be good to go!

Load the library and configure as desired:

With the package installed, it should be loaded as any other package would:

library("ctools")

Loading the package into namespace automatically creates a PSOCKS cluster if run on Windows and an MPI cluster if MPI is detected. If running a Unix-like machine, e.g. OSX, Linux, FORK style parallelization will be used. Under all methods, the default configuration utilizes the maximum number of CPUs available to R.

c.config(<integer>)

The c.config() function accepts a numerical argument and attempts to set the number of cores available to R equal to the argument. If the number of cores specified is less than 1, 1 core will be used. If the number of cores specified is greater than the number available, the maximum number available will be used.

Note that this function does nothing for MPI clusters, the user should specify the number of CPUs or "slots" available to R in the machine file parsed by mpiexec. Note also that changing the number of cores for a PSOCKS style cluster, i.e. when running code on Windows, will close the existing cluster and start a new one with the specified number of cores. Thus, any objects exported to the original cluster will have to be re-exported. When utilizing FORK style parallelization, i.e. running code on Unix-like systems, the number of cores can be changed at any point without any issues.

Utility Functions:

The script comes with 7 wrapper functions as of right now:

c.cores()
c.library(("libname1","libname2",...)

Cluster Only Functions:

These functions are only useful when using cluster style parallelizatin However, these commands will do absolutely nothing while using FORK and allow the script to be completely portable across platforms. So rather code as if you need to export objects to workers.

c.export("obj1","obj2",..., push = TRUE, clear = FALSE )

Export the named objects to all cluseter nodes by default. If push = FALSE the object names are added to a list of objects to exported and exported with a subsequent call to c.export where push = TRUE along with any other objects named. If clear = TRUE clear the names from this stored export list before doing anything else.

var <- 1:10 c.export("var", push = F) # Adds 'var' to list of objects to be exported later - NOTE: This process is uncessary if you are not using PSOCKS or MPI,

c.eval(expression)
c.call(FUN)

Apply family of functions

c.apply(X,MARGIN,FUN)
c.applyLB(X,FUN)
c.lapply(X,FUN)
c.lapplyLB(X,FUN)
c.sapply(X,FUN)
c.sapplyLB(X,FUN)


bamonroe/ctools documentation built on May 11, 2019, 6:19 p.m.