Description Usage Arguments Details Value Examples
performs chunk processing or split-apply-combine on the data in a distributed data frame(ddf)
1 2 3 |
ddfdir |
(string) path of ddf directory |
groupby |
(character vector) Columns names to used to split the data(if
missing, |
fun |
(object of class function) function to apply on each subset after the split |
collect |
(string) Collect the result as |
temploc |
(string) Path where intermediary files are kept |
nbins |
(positive integer) Number of directories into which the distributed dataframe (ddf) or distributed data object (ddo) is distributed |
chunk |
(positive integer) Number of rows of the file to be read at a time |
spill |
(positive integer) Maximum number of rows of any subset resulting from split |
cores |
(positive integer) Number of cores to be used in parallel |
buffer |
(positive integer) Size of batches of key-value pairs to be passed to the map OR Size of the batches of key-value pairs to flush to intermediate storage from the map output OR Size of the batches of key-value pairs to send to the reduce |
... |
Arguments to be passed to |
see fileply
list or a dataframe or a TRUE(when collect is 'none').
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | write.table(mtcars, "mtcars.csv", row.names = FALSE, sep = ",")
# create a ddf by keeping `keepddf = TRUE`
co <- capture.output(temp <- fileply("mtcars.csv"
, groupby = c("carb", "gear")
, fun = identity
, collect = "list"
, sep = ","
, header = TRUE
, keepddf = TRUE)
, file = NULL
, type = "message"
)
# use the ddf instead of reading the CSV again
temp2 <- ddfply(file.path(strsplit(co[6], ": ")[[1]][2], "data")
, groupby = c("gear")
, fun = identity
, collect = "list"
, sep = ","
, header = TRUE
)
temp2
unlink("mtcars.csv")
unlink(strsplit(co[6], ": ")[[1]][2], recursive = TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.