ddfply: ddfply
In talegari/fileplyr: Chunk Processing or Split-Apply-Combine on Delimited Files and Distributed Dataframes

Description Usage Arguments Details Value Examples

performs chunk processing or split-apply-combine on the data in a distributed data frame(ddf)

1
2
3

ddfply(ddfdir, groupby, fun = identity, collect = "none",
  temploc = getwd(), nbins = 10, chunk = 50000, spill = 1e+06,
  cores = 1, buffer = 1e+09, ...)

`ddfdir`	(string) path of ddf directory
`groupby`	(character vector) Columns names to used to split the data(if missing, `fun` is applied on each chunk)
`fun`	(object of class function) function to apply on each subset after the split
`collect`	(string) Collect the result as `list` or `dataframe` or `none`. `none` keeps the resulting ddo on disk.
`temploc`	(string) Path where intermediary files are kept
`nbins`	(positive integer) Number of directories into which the distributed dataframe (ddf) or distributed data object (ddo) is distributed
`chunk`	(positive integer) Number of rows of the file to be read at a time
`spill`	(positive integer) Maximum number of rows of any subset resulting from split
`cores`	(positive integer) Number of cores to be used in parallel
`buffer`	(positive integer) Size of batches of key-value pairs to be passed to the map OR Size of the batches of key-value pairs to flush to intermediate storage from the map output OR Size of the batches of key-value pairs to send to the reduce
`...`	Arguments to be passed to `data.table` function asis.

see fileply

list or a dataframe or a TRUE(when collect is 'none').

write.table(mtcars, "mtcars.csv", row.names = FALSE, sep = ",")
# create a ddf by keeping `keepddf = TRUE`
co <- capture.output(temp <- fileply("mtcars.csv"
                                     , groupby = c("carb", "gear")
                                     , fun     = identity
                                     , collect = "list"
                                     , sep     =  ","
                                     , header  = TRUE
                                     , keepddf = TRUE)
                     , file = NULL
                     , type = "message"
                     )
# use the ddf instead of reading the CSV again
temp2 <- ddfply(file.path(strsplit(co[6], ": ")[[1]][2], "data")
                , groupby = c("gear")
                , fun     = identity
                , collect = "list"
                , sep     =  ","
                , header  = TRUE
                )
temp2
unlink("mtcars.csv")
unlink(strsplit(co[6], ": ")[[1]][2], recursive = TRUE)

talegari/fileplyr documentation built on May 31, 2019, 2:51 a.m.

talegari/fileplyr index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

talegari/fileplyr
Chunk Processing or Split-Apply-Combine on Delimited Files and Distributed Dataframes

ddfply: ddfply
In talegari/fileplyr: Chunk Processing or Split-Apply-Combine on Delimited Files and Distributed Dataframes

Description

Usage

Arguments

Details

Value

Examples

Related to ddfply in talegari/fileplyr...

R Package Documentation

Browse R Packages

We want your feedback!

talegari/fileplyr Chunk Processing or Split-Apply-Combine on Delimited Files and Distributed Dataframes

ddfply: ddfply In talegari/fileplyr: Chunk Processing or Split-Apply-Combine on Delimited Files and Distributed Dataframes

Description

Usage

Arguments

Details

Value

Examples

Related to ddfply in talegari/fileplyr...

R Package Documentation

Browse R Packages

We want your feedback!

talegari/fileplyr
Chunk Processing or Split-Apply-Combine on Delimited Files and Distributed Dataframes

ddfply: ddfply
In talegari/fileplyr: Chunk Processing or Split-Apply-Combine on Delimited Files and Distributed Dataframes