parallel.csv: Parallel CSV Converter

Description Usage Arguments Value Examples

View source: R/parallel.csv.R

Description

Parallelizes the writing of separate CSV files (still sequential reading) in order to store them in fst format (also, overwrites fst::threads_fst. Requires data.table and fst packages.

Usage

1
2
3
4
parallel.csv(file, compress = 35, progress_bar = TRUE, clean_mem = FALSE,
  cl = NULL, max_threads = max(ifelse(is.null(cl), parallel::detectCores(),
  ifelse(!is.list(cl), round(parallel::detectCores()/cl),
  round(parallel::detectCores()/length(cl)))), 1), wkdir = NULL, ...)

Arguments

file

Type: vector of characters. Path to all files to read.

compress

Type: numeric. Compression rate to use. Defaults to 35.

progress_bar

Type: logical. Whether to print a progress bar. Defaults to TRUE.

clean_mem

Type: logical. Whether the force garbage collection at the end of each file read in order to reclaim RAM. Defaults to FALSE.

cl

Type: cluster or integer. A parallel cluster for parallelized calls. Used only when progress_bar = TRUE. Writes to the cluster most of the variables (compress, max_threads, clean_mem, wkdir) and removes them at the end. When it is a number, creates and destroys a cluster with the specified number of parallel clusters. Defaults to NULL.

max_threads

Type: numeric. The maximum number of threads allowed to adapt fst::threads_fst. Make sure the result of cl cores multiplicated by max_threads is not bigger than the number of threads in your computer. Defaults to max(ifelse(is.null(cl), parallel::detectCores(), ifelse(!is.list(cl), round(parallel::detectCores() / cl), round(parallel::detectCores() / length(cl)))), 1), which means at least 1 thread, and adjust automatically the number of threads depending on the number of cores per cluster. Note that it takes the rounded value, which might over and under allocate threads.

wkdir

Type: character. The working directory, when using a cluster. Defaults to NULL.

...

Other arguments to pass to fst::write.fst.

Value

The element or the list of fst file names.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## Not run: 
# Cannot pass CRAN checks. Disabled.
# Do it on your own files!
library(fst) # devtools::install_github("fstPackage/fst@e060e62")
library(data.table)
library(parallel)

parallel.csv(c("file_1.csv", "file_2.csv"), max_threads = 1, progress_bar = TRUE)
parallel.csv(paste0("file_", 1:100, ".csv"), max_threads = 1, progress_bar = TRUE, cl = 8)

## End(Not run)

Laurae2/LauraeDS documentation built on May 29, 2019, 2:25 p.m.