datadr: Divide and Recombine for Large, Complex Data

Methods for dividing data into subsets, applying analytical methods to the subsets, and recombining the results. Comes with a generic MapReduce interface as well. Works with key-value pairs stored in memory, on local disk, or on HDFS, in the latter case using the R and Hadoop Integrated Programming Environment (RHIPE).

Author
Ryan Hafen [aut, cre], Landon Sego [ctb]
Date of publication
2016-10-02 15:51:50
Maintainer
Ryan Hafen <rhafen@gmail.com>
License
BSD_3_clause + file LICENSE
Version
0.8.6
URLs

View on CRAN

Man pages

addData
Add Key-Value Pairs to a Data Connection
addTransform
Add a Transformation Function to a Distributed Data Object
adult
"Census Income" Dataset
applyTransform
Apply transformation function(s)
as.data.frame.ddf
Turn 'ddf' Object into Data Frame
as.list.ddo
Turn 'ddo' / 'ddf' Object into a list
bsv
Construct Between Subset Variable (BSV)
charFileHash
Character File Hash Function
combCollect
"Collect" Recombination
combDdf
"DDF" Recombination
combDdo
"DDO" Recombination
combMean
Mean Recombination
combMeanCoef
Mean Coefficient Recombination
combRbind
"rbind" Recombination
condDiv
Conditioning Variable Division
convert
Convert 'ddo' / 'ddf' Objects
datadr-package
datadr
ddf
Instantiate a Distributed Data Frame ('ddf')
ddf-accessors
Accessor methods for 'ddf' objects
ddo
Instantiate a Distributed Data Object ('ddo')
ddo-ddf-accessors
Accessor Functions
ddo-ddf-attributes
Managing attributes of 'ddo' or 'ddf' objects
digestFileHash
Digest File Hash Function
divide
Divide a Distributed Data Object
divide-internals
Functions used in divide()
drAggregate
Division-Agnostic Aggregation
drBLB
Bag of Little Bootstraps Transformation Method
drFilter
Filter a 'ddo' or 'ddf' Object
drGetGlobals
Get Global Variables and Package Dependencies
drGLM
GLM Transformation Method
drHexbin
HexBin Aggregation for Distributed Data Frames
drJoin
Join Data Sources by Key
drLapply
Apply a function to all key-value pairs of a ddo/ddf object
drLM
LM Transformation Method
drPersist
Persist a Transformed 'ddo' or 'ddf' Object
drQuantile
Sample Quantiles for 'ddf' Objects
drRead.table
Data Input
drSample
Take a Sample of Key-Value Pairs Take a sample of key-value...
drSubset
Subsetting Distributed Data Frames
flatten
"Flatten" a ddf Subset
getCondCuts
Get names of the conditioning variable cuts
hdfsConn
Connect to Data Source on HDFS
kvApply
Apply Function to Key-Value Pair
kvPair
Specify a Key-Value Pair
kvPairs
Specify a Collection of Key-Value Pairs
localDiskConn
Connect to Data Source on Local Disk
localDiskControl
Specify Control Parameters for MapReduce on a Local Disk...
makeExtractable
Take a ddo/ddf HDFS data object and turn it into a mapfile
mrExec
Execute a MapReduce Job
mr-summary-stats
Functions to Compute Summary Statistics in MapReduce
pipe
Pipe data
print.ddo
Print a "ddo" or "ddf" Object
print.kvPair
Print a key-value pair
print.kvValue
Print value of a key-value pair
readHDFStextFile
Experimental HDFS text reader helper function
readTextFileByChunk
Experimental sequential text reader helper function
recombine
Recombine
removeData
Remove Key-Value Pairs from a Data Connection
rhipeControl
Specify Control Parameters for RHIPE Job
rrDiv
Random Replicate Division
setupTransformEnv
Set up transformation environment
splitvars
Extract "Split" Variable(s)
to_ddf
Convert dplyr grouped_df to ddf
updateAttributes
Update Attributes of a 'ddo' or 'ddf' Object

Files in this package

datadr
datadr/tests
datadr/tests/testthat.R
datadr/tests/testthat
datadr/tests/testthat/test-hexbin.R
datadr/tests/testthat/test-summary.R
datadr/tests/testthat/test-join.R
datadr/tests/testthat/test-quantile.R
datadr/tests/testthat/test-globals.R
datadr/tests/testthat/test-kvMemory.R
datadr/tests/testthat/test-dataops.R
datadr/tests/testthat/test-spark.R
datadr/tests/testthat/test-readtext.R
datadr/tests/testthat/test-kvHDFS.R
datadr/tests/testthat/test-kvLocalDisk.R
datadr/NAMESPACE
datadr/NEWS.md
datadr/data
datadr/data/adult.rda
datadr/R
datadr/R/ddo_ddf_kvMemory.R
datadr/R/zzz_constants.R
datadr/R/bsv.R
datadr/R/dataops_join.R
datadr/R/mapreduce_kvLocalDisk.R
datadr/R/mapreduce_kvHDFS.R
datadr/R/agnostic_summary.R
datadr/R/conn_spark.R
datadr/R/agnostic_hexbin.R
datadr/R/recombine_transforms.R
datadr/R/divSpec.R
datadr/R/ddo_ddf_updateAttrs.R
datadr/R/ddo_ddf_kvHDFS.R
datadr/R/dataops_subset.R
datadr/R/dataops_persist.R
datadr/R/dataops_filter.R
datadr/R/ddo_ddf_methods.R
datadr/R/agnostic_aggregate.R
datadr/R/conn_HDFS.R
datadr/R/dataops_readTable.R
datadr/R/globals.R
datadr/R/ddo_addTransform.R
datadr/R/ddf_summary_print.R
datadr/R/agnostic_quantile.R
datadr/R/divide_df.R
datadr/R/recombine_combine.R
datadr/R/mapreduce_spark.R
datadr/R/dplyr.R
datadr/R/dataset_census.R
datadr/R/divide.R
datadr/R/ddo_ddf_kvSpark.R
datadr/R/mapreduce_kvMemory.R
datadr/R/divSpec_rrDiv.R
datadr/R/dataops_lapply.R
datadr/R/ddo_ddf_kvLocalDisk.R
datadr/R/kvPairs.R
datadr/R/recombine.R
datadr/R/ddo_ddf_print.R
datadr/R/misc.R
datadr/R/conn_localDisk.R
datadr/R/mapreduce.R
datadr/R/divSpec_condDiv.R
datadr/R/dataops_sample.R
datadr/R/ddo_ddf.R
datadr/R/dataops_read.R
datadr/R/datadr-package.R
datadr/R/conn_memory.R
datadr/README.md
datadr/MD5
datadr/DESCRIPTION
datadr/man
datadr/man/applyTransform.Rd
datadr/man/readTextFileByChunk.Rd
datadr/man/localDiskControl.Rd
datadr/man/drLM.Rd
datadr/man/pipe.Rd
datadr/man/ddo-ddf-attributes.Rd
datadr/man/drFilter.Rd
datadr/man/drHexbin.Rd
datadr/man/getCondCuts.Rd
datadr/man/drGetGlobals.Rd
datadr/man/combRbind.Rd
datadr/man/combMeanCoef.Rd
datadr/man/divide.Rd
datadr/man/drBLB.Rd
datadr/man/digestFileHash.Rd
datadr/man/ddf-accessors.Rd
datadr/man/readHDFStextFile.Rd
datadr/man/rhipeControl.Rd
datadr/man/kvPair.Rd
datadr/man/ddo-ddf-accessors.Rd
datadr/man/removeData.Rd
datadr/man/combDdf.Rd
datadr/man/setupTransformEnv.Rd
datadr/man/combDdo.Rd
datadr/man/convert.Rd
datadr/man/mr-summary-stats.Rd
datadr/man/divide-internals.Rd
datadr/man/condDiv.Rd
datadr/man/flatten.Rd
datadr/man/drGLM.Rd
datadr/man/to_ddf.Rd
datadr/man/mrExec.Rd
datadr/man/drSubset.Rd
datadr/man/ddo.Rd
datadr/man/bsv.Rd
datadr/man/as.data.frame.ddf.Rd
datadr/man/combMean.Rd
datadr/man/print.kvPair.Rd
datadr/man/kvPairs.Rd
datadr/man/ddf.Rd
datadr/man/recombine.Rd
datadr/man/drPersist.Rd
datadr/man/print.ddo.Rd
datadr/man/adult.Rd
datadr/man/updateAttributes.Rd
datadr/man/drLapply.Rd
datadr/man/drAggregate.Rd
datadr/man/combCollect.Rd
datadr/man/splitvars.Rd
datadr/man/rrDiv.Rd
datadr/man/makeExtractable.Rd
datadr/man/drSample.Rd
datadr/man/localDiskConn.Rd
datadr/man/addData.Rd
datadr/man/datadr-package.Rd
datadr/man/drJoin.Rd
datadr/man/drQuantile.Rd
datadr/man/as.list.ddo.Rd
datadr/man/charFileHash.Rd
datadr/man/drRead.table.Rd
datadr/man/kvApply.Rd
datadr/man/hdfsConn.Rd
datadr/man/addTransform.Rd
datadr/man/print.kvValue.Rd
datadr/LICENSE