00_pbdMPI-package: R Interface to MPI (Programming with Big Data in R Project)
In pbdMPI: R Interface to MPI for HPC Clusters (Programming with Big Data Project)

pbdMPI-package

R Documentation

R Interface to MPI (Programming with Big Data in R Project)

Description

A simplified, efficient, interface to MPI for HPC clusters. It is a derivation and rethinking of the Rmpi package that embraces the prevalent parallel programming style on HPC clusters. Beyond the interface, a collection of functions for global work with distributed data is included. It is based on S4 classes and methods.

Details

This package requires an MPI library (OpenMPI, MPICH2, or LAM/MPI). Standard installation in an R session with
> install.packages("pbdMPI")
should work in most cases.

On HPC clusters, it is strongly recommended that you check with your HPC cluster documentation for specific requirements, such as module software environments. Some module examples relevant to R and MPI are
$ module load openmpi
$ module load openblas
$ module load flexiblas
$ module load r
possibly giving specific versions and possibly with some upper case letters. Although module software environments are widely used, the specific module names and their dependence structure are not standard across cluster installations. The command
$ module avail
usually lists the available software modules on your cluster.

To install on the Unix command line after downloading the source file, use R CMD INSTALL.

If the MPI library is not found, after checking that you are loading the correct module environments, the following arguments can be used to specify its non-standard location on your system

Argument	Default
`--with-mpi-type`	`OPENMPI`
`--with-mpi-include`	`${MPI_ROOT}/include`
`--with-mpi-libpath`	`${MPI_ROOT}/lib`
`--with-mpi`	`${MPI_ROOT}`

where ${MPI_ROOT} is the path to the MPI root. See the package source file pbdMPI/configure for details.

Loading library(pbdMPI) sets a few global variables, including the environment .pbd_env, where many defaults are set, and initializes MPI. In most cases, the defaults should not be modified. Rather, the parameters of the functions that use them should be changed. All codes must end with finalize() to cleanly exit MPI.

Most functions are assumed to run as Single Program, Multiple Data (SPMD), i.e. in batch mode. SPMD is based on cooperation between parallel copies of a single program, which is more scalable than a manager-workers approach that is natural in interactive programming. Interactivity with an HPC cluster is more efficiently handled by a client-server approach, such as that enabled by the remoter package.

On most clusters, codes run with mpirun or mpiexec and Rscript, such as
> mpiexec -np 2 Rscript some_code.r
where some_code.r contains the entire SPMD program. The MPI Standard 4.0 recommends mpiexec over mpirun. Some MPI implementations may have minor differences between the two but under OpenMPI 5.0 they are synonyms that produce the same behavior.

The package source files provide several examples based on pbdMPI, such as

Directory	Examples
`pbdMPI/inst/examples/test_spmd/`	main SPMD functions
`pbdMPI/inst/examples/test_rmpi/`	analogues to Rmpi
`pbdMPI/inst/examples/test_parallel/`	analogues to parallel
`pbdMPI/inst/examples/test_performance/`	performance tests
`pbdMPI/inst/examples/test_s4/`	S4 extension
`pbdMPI/inst/examples/test_cs/`	client/server examples
`pbdMPI/inst/examples/test_long_vector/`	long vector examples

where test_long_vector needs a recompile with setting

#define MPI_LONG_DEBUG 1

in pbdMPI/src/pkg_constant.h.

The current version is mainly written and tested under OpenMPI environments on Linux systems (CentOS 7, RHEL 8, Xubuntu). Also, it is tested on macOS with Homebrew-installed OpenMPI and under MPICH2 environments on Windows systems, although the primary target systems are HPC clusters running Linux OS.

Author(s)

Wei-Chen Chen wccsnow@gmail.com, George Ostrouchov, Drew Schmidt, Pragneshkumar Patel, and Hao Yu.

References

Programming with Big Data in R Website: https://pbdr.org/

Examples

## Not run: 
### On command line, run each demo with 2 processors by
### (Use Rscript.exe on Windows systems)
# mpiexec -np 2 Rscript -e "demo(allgather,'pbdMPI',ask=F,echo=F)"
# mpiexec -np 2 Rscript -e "demo(allreduce,'pbdMPI',ask=F,echo=F)"
# mpiexec -np 2 Rscript -e "demo(bcast,'pbdMPI',ask=F,echo=F)"
# mpiexec -np 2 Rscript -e "demo(gather,'pbdMPI',ask=F,echo=F)"
# mpiexec -np 2 Rscript -e "demo(reduce,'pbdMPI',ask=F,echo=F)"
# mpiexec -np 2 Rscript -e "demo(scatter,'pbdMPI',ask=F,echo=F)"
### Or
# execmpi("demo(allgather,'pbdMPI',ask=F,echo=F)", nranks = 2L)
# execmpi("demo(allreduce,'pbdMPI',ask=F,echo=F)", nranks = 2L)
# execmpi("demo(bcast,'pbdMPI',ask=F,echo=F)", nranks = 2L)
# execmpi("demo(gather,'pbdMPI',ask=F,echo=F)", nranks = 2L)
# execmpi("demo(reduce,'pbdMPI',ask=F,echo=F)", nranks = 2L)
# execmpi("demo(scatter,'pbdMPI',ask=F,echo=F)", nranks = 2L)

## End(Not run)

pbdMPI documentation built on April 13, 2025, 9:07 a.m.