yy_api_apply: Parallel Apply and Lapply Functions
In RBigData/pbdMPI: R Interface to MPI for HPC Clusters (Programming with Big Data Project)

apply and lapply

R Documentation

Parallel Apply and Lapply Functions

Description

The functions are parallel versions of apply and lapply functions.

Usage

pbdApply(X, MARGIN, FUN, ..., pbd.mode = c("mw", "spmd", "dist"),
         rank.source = .pbd_env$SPMD.CT$rank.root,
         comm = .pbd_env$SPMD.CT$comm,
         barrier = TRUE)
pbdLapply(X, FUN, ..., pbd.mode = c("mw", "spmd", "dist"),
          rank.source = .pbd_env$SPMD.CT$rank.root,
          comm = .pbd_env$SPMD.CT$comm,
          bcast = FALSE, barrier = TRUE)
pbdSapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE,
          pbd.mode = c("mw", "spmd", "dist"),
          rank.source = .pbd_env$SPMD.CT$rank.root,
          comm = .pbd_env$SPMD.CT$comm,
          bcast = FALSE, barrier = TRUE)

Arguments

`X`	a matrix or array in `pbdApply()` or a list in `pbdLapply()` and `pbdSapply()`.
`MARGIN`	`MARGIN` as in the `apply()`.
`FUN`	as in the `apply()`.
`...`	optional arguments to `FUN`.
`simplify`	as in the `sapply()`.
`USE.NAMES`	as in the `sapply()`.
`pbd.mode`	mode of distributed data `X`.
`rank.source`	a rank of source where `X` broadcast from.
`comm`	a communicator number.
`bcast`	if bcast to all ranks.
`barrier`	if barrier for all ranks.

Details

All functions are majorly called in manager/workers mode (pbd.model = "mw"), and just work the same as their serial version.

If pbd.mode = "mw", the X in rank.source (manager) will be distributed to the workers, then FUN will be applied to the new data, and results gathered to rank.source. “In SPMD, the manager is one of workers.” ... is also scatter() from rank.source.

If pbd.mode = "spmd", the same copy of X is expected on all ranks, and the original apply(), lapply(), or sapply() will operate on part of X. An explicit allgather() or gather() will be needed to aggregate the results.

If pbd.mode = "dist", different X are expected on all ranks, i.e. ‘distinct or distributed’ X, and original apply(), lapply(), or sapply() will operate on the distinct X. An explicit allgather() or gather() will be needed to aggregate the results.

In SPMD, it is better to split data into pieces, so that X is a local piece of a global matrix. If the "apply" dimension is local, the base apply() function can be used.

Value

A list or a matrix will be returned.

Author(s)

Wei-Chen Chen wccsnow@gmail.com, George Ostrouchov, Drew Schmidt, Pragneshkumar Patel, and Hao Yu.

References

Programming with Big Data in R Website: https://pbdr.org/

Examples


### Save code in a file "demo.r" and run with 2 processors by
### SHELL> mpiexec -np 2 Rscript demo.r

spmd.code <- "
### Initialize
suppressMessages(library(pbdMPI, quietly = TRUE))

.comm.size <- comm.size()
.comm.rank <- comm.rank()

### Example for pbdApply.
N <- 100
x <- matrix((1:N) + N * .comm.rank, ncol = 10)
y <- pbdApply(x, 1, sum, pbd.mode = \"mw\")
comm.print(y)

y <- pbdApply(x, 1, sum, pbd.mode = \"spmd\")
comm.print(y)

y <- pbdApply(x, 1, sum, pbd.mode = \"dist\")
comm.print(y)


### Example for pbdApply for 3D array.
N <- 60
x <- array((1:N) + N * .comm.rank, c(3, 4, 5))
dimnames(x) <- list(lat = paste(\"lat\", 1:3, sep = \"\"),
                    lon = paste(\"lon\", 1:4, sep = \"\"),
                    time = paste(\"time\", 1:5, sep = \"\"))
comm.print(x[,, 1:2])

y <- pbdApply(x, c(1, 2), sum, pbd.mode = \"mw\")
comm.print(y)

y <- pbdApply(x, c(1, 2), sum, pbd.mode = \"spmd\")
comm.print(y)

y <- pbdApply(x, c(1, 2), sum, pbd.mode = \"dist\")
comm.print(y)


### Example for pbdLapply.
N <- 100
x <- split((1:N) + N * .comm.rank, rep(1:10, each = 10))
y <- pbdLapply(x, sum, pbd.mode = \"mw\")
comm.print(unlist(y))

y <- pbdLapply(x, sum, pbd.mode = \"spmd\")
comm.print(unlist(y))

y <- pbdLapply(x, sum, pbd.mode = \"dist\")
comm.print(unlist(y))

### Finish.
finalize()
"
pbdMPI::execmpi(spmd.code, nranks = 2L)

RBigData/pbdMPI documentation built on Jan. 31, 2024, 10:34 p.m.