ww_comm_read: Global Reading Functions
In pbdMPI: R Interface to MPI for HPC Clusters (Programming with Big Data Project)

global reading

R Documentation

Global Reading Functions

Description

These functions are global reading from specified file.

Usage

comm.read.table(file, header = FALSE, sep = "", quote = "\"'",
                dec = ".",
                na.strings = "NA", colClasses = NA, nrows = -1, skip = 0,
                check.names = TRUE, fill = !blank.lines.skip,
                strip.white = FALSE,
                blank.lines.skip = TRUE, comment.char = "#",
                allowEscapes = FALSE,
                flush = FALSE,
                fileEncoding = "", encoding = "unknown",
                read.method = .pbd_env$SPMD.IO$read.method[1],
                balance.method = .pbd_env$SPMD.IO$balance.method[1],
                comm = .pbd_env$SPMD.CT$comm)

comm.read.csv(file, header = TRUE, sep = ",", quote = "\"",
              dec = ".", fill = TRUE, comment.char = "", ...,
              read.method = .pbd_env$SPMD.IO$read.method[1],
              balance.method = .pbd_env$SPMD.IO$balance.method[1],
              comm = .pbd_env$SPMD.CT$comm)
     
comm.read.csv2(file, header = TRUE, sep = ";", quote = "\"",
               dec = ",", fill = TRUE, comment.char = "", ...,
               read.method = .pbd_env$SPMD.IO$read.method[1],
               balance.method = .pbd_env$SPMD.IO$balance.method[1],
               comm = .pbd_env$SPMD.CT$comm)

Arguments

`file`	as in `read.table()`.
`header`	as in `read.table()`.
`sep`	as in `read.table()`.
`quote`	as in `read.table()`.
`dec`	as in `read.table()`.
`na.strings`	as in `read.table()`.
`colClasses`	as in `read.table()`.
`nrows`	as in `read.table()`.
`skip`	as in `read.table()`.
`check.names`	as in `read.table()`.
`fill`	as in `read.table()`.
`strip.white`	as in `read.table()`.
`blank.lines.skip`	as in `read.table()`.
`comment.char`	as in `read.table()`.
`allowEscapes`	as in `read.table()`.
`flush`	as in `read.table()`.
`fileEncoding`	as in `read.table()`.
`encoding`	as in `read.table()`.
`...`	as in `read.csv*()`.
`read.method`	either "gbd" or "common".
`balance.method`	balance method for `read.method = "gbd"` as `nrows = -1` and `skip = 0` are set.
`comm`	a communicator number.

Details

These functions will apply read.table() locally and sequentially from rank 0, 1, 2, ...

By default, rank 0 reads the file only, then scatter to other ranks for small datasets (.pbd_env$SPMD.IO$max.read.size) in read.method = "gbd". (bcast to others in read.method = "common".)

As dataset size increases, the reading is performed from each ranks and read portion of rows in "gbd" format as described in pbdDEMO vignettes and used in pmclust.

comm.load.balance() is called for "gbd" method as as nrows = -1 and skip = 0 are set. Note that the default method "block" is the better way for performance in general that distributes equally and leaves residuals on higher ranks evenly. "block0" is the other way around. "block.cyclic" is only useful for converting to ddmatrix as in pbdDMAT.

Value

A distributed data.frame is returned.

All factors are disable and read as characters or as what data should be.

Author(s)

Wei-Chen Chen wccsnow@gmail.com, George Ostrouchov, Drew Schmidt, Pragneshkumar Patel, and Hao Yu.

References

Programming with Big Data in R Website: https://pbdr.org/

Examples

## Not run: 
### Save code in a file "demo.r" and run with 2 processors by
### SHELL> mpiexec -np 2 Rscript demo.r

spmd.code <- "
### Initialize
suppressMessages(library(pbdMPI, quietly = TRUE))

### Check.
if(comm.size() != 2){
  comm.stop(\"2 processors are requried.\")
}

### Manually distributed iris.
da <- iris[get.jid(nrow(iris)),]

### Dump data.
comm.write.table(da, file = \"iris.txt\", quote = FALSE, sep = \"\\t\",
                 row.names = FALSE)

### Read back in.
da.gbd <- comm.read.table(\"iris.txt\", header = TRUE, sep = \"\\t\",
                          quote = \"\")
comm.print(c(nrow(da), nrow(da.gbd)), all.rank = TRUE)

### Read in common.
da.common <- comm.read.table(\"iris.txt\", header = TRUE, sep = \"\\t\",
                             quote = \"\", read.method = \"common\")
comm.print(c(nrow(da.common), sum(da.common != iris)))

### Finish.
finalize()
"
# execmpi(spmd.code, nranks = 2L)

## End(Not run)

pbdMPI documentation built on April 13, 2025, 9:07 a.m.