# Load Balancing a Dataset

### Description

These functions will rearrange data for all processors such that the data amount of each processor is nearly equal.

### Usage

1 2 3 4 5 6 7 | ```
balance.info(X.gbd, comm = .pbd_env$SPMD.CT$comm,
gbd.major = .pbd_env$gbd.major, method = .pbd_env$divide.method[1])
load.balance(X.gbd, bal.info = NULL, comm = .pbd_env$SPMD.CT$comm,
gbd.major = .pbd_env$gbd.major)
unload.balance(new.X.gbd, bal.info, comm = .pbd_env$SPMD.CT$comm)
``` |

### Arguments

`X.gbd` |
a GBD data matrix (converted if not). |

`comm` |
a communicator number. |

`gbd.major` |
1 for row-major storage, 2 for column-major. |

`method` |
"block.cyclic" or "block0". |

`bal.info` |
a returned object from |

`new.X.gbd` |
a GBD data matrix or vector |

### Details

`X.gbd`

is the data matrix with dimension `N.gbd * p`

and exists
on all processors where `N.gbd`

may be vary across processors. If
`X.gbd`

is a vector, then it is converted to a `N.gbd * 1`

matrix.

`balance.info`

provides the information how to balance data set such
that all processors own similar amount of data. This information may be also
useful for tracking where the data go or from.

`load.balance`

does the job to transfer data from one processor with
more data to the other processors with less data based on the balance
information `balance.info`

.

`unload.balance`

is the inversed function of `load.balance`

, and
it takes the same information `bal.info`

to reverse the balanced result
back to the original order. `new.X.gbd`

is usually the output of
`load.balance{X.gbd}`

or other results of further computing of it.
Again, if `new.X.gbd`

is a vector, then it is converted to an one
column matrix.

### Value

`balance.info`

returns a list contains two data frames and two
vectors.

Two data frames are `send`

and `recv`

for sending and receiving
data. Each data frame has two columns `org`

and `belong`

for where
data original in and new belongs. Number of row of `send`

should equal
to the `N.gbd`

, and number of row of `recv`

should be nearly equal
to `n = N / COMM.SIZE`

where `N`

is the total observations of all
processors.

Two vectors are `N.allgbd`

and `new.N.allgbd`

which are all
numbers of rows of `X.gbd`

on all processes before and after load
balance, correspondingly. Both have length equals to `comm.size(comm)`

.

`load.balance`

returns a matrix for each processor and the matrix has
the dimension nearly equal to `n * p`

.

`unload.balance`

returns a matrix with the same length/rows as the
original number of row of `X.gbd`

.

### Warning(s)

These function only support total object length is less than 2^32 - 1 for machines using 32-bit integer.

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | ```
## Not run:
# Save code in a file "demo.r" and run in 4 processors by
# > mpiexec -np 4 Rscript demo.r
### Setup environment.
library(pbdDEMO, quiet = TRUE)
### Generate an example data.
N.gbd <- 5 * (comm.rank() * 2)
X.gbd <- rnorm(N.gbd * 3)
dim(X.gbd) <- c(N.gbd, 3)
comm.cat("X.gbd[1:5,]\n", quiet = TRUE)
comm.print(X.gbd[1:5,], rank.print = 1, quiet = TRUE)
bal.info <- balance.info(X.gbd)
new.X.gbd <- load.balance(X.gbd, bal.info)
org.X.gbd <- unload.balance(new.X.gbd, bal.info)
comm.cat("org.X.gbd[1:5,]\n", quiet = TRUE)
comm.print(org.X.gbd[1:5,], rank.print = 1, quiet = TRUE)
if(any(org.X.gbd - X.gbd != 0)){
cat("Unbalance fails in the rank ", comm.rank(), "\n")
}
### Quit.
finalize()
## End(Not run)
``` |