comm.fread | R Documentation |
Given a directory, comm.fread()
reads all csv files contained
in it in parallel with available resources.
comm.fread(
dir,
pattern = "*.csv$",
shcom = NULL,
readers = comm.size(),
verbose = 0,
...
)
dir |
A directory containing the files desired to be read. The directory should be accessible to all readers. |
pattern |
The pattern for files desired to be read. |
shcom |
Additional shell command passed to |
readers |
The number of readers. |
verbose |
Determines the verbosity level. Acceptable values are 0, 1, 2, and 3 for least to most verbosity. |
... |
Additional arguments to be passed to |
Each MPI rank reads different but
entire files. Best load balance is achieved when the number of files is
divisible by the number of ranks and the files are approximately the same
size. All files are assumed to contain the same columns. See note for
parameter shcom
if you are working with a Lustre parallel file
system.
TODO
## Not run:
### Save code in a file "demo.r" and run with 2 processors by
### SHELL> mpiexec -np 2 Rscript demo.r
library(pbdMPI)
library(pbdIO)
path <- "/tmp/read"
comm.print(dir(path))
## [1] "a.csv" "b.csv"
X <- comm.fread(path)
comm.print(X, all.rank=TRUE)
## COMM.RANK = 0
## a b c
## 1: 1 2 3
## COMM.RANK = 1
## a b c
## 1: 2 3 4
finalize()
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.