datadr: Divide and Recombine for Large, Complex Data

Methods for dividing data into subsets, applying analytical methods to the subsets, and recombining the results. Comes with a generic MapReduce interface as well. Works with key-value pairs stored in memory, on local disk, or on HDFS, in the latter case using the R and Hadoop Integrated Programming Environment (RHIPE).

AuthorRyan Hafen [aut, cre], Landon Sego [ctb]
Date of publication2016-10-02 15:51:50
MaintainerRyan Hafen <rhafen@gmail.com>
LicenseBSD_3_clause + file LICENSE
Version0.8.6
http://deltarho.org/docs-datadr

View on CRAN

Man pages

addData: Add Key-Value Pairs to a Data Connection

addTransform: Add a Transformation Function to a Distributed Data Object

adult: "Census Income" Dataset

applyTransform: Apply transformation function(s)

as.data.frame.ddf: Turn 'ddf' Object into Data Frame

as.list.ddo: Turn 'ddo' / 'ddf' Object into a list

bsv: Construct Between Subset Variable (BSV)

charFileHash: Character File Hash Function

combCollect: "Collect" Recombination

combDdf: "DDF" Recombination

combDdo: "DDO" Recombination

combMean: Mean Recombination

combMeanCoef: Mean Coefficient Recombination

combRbind: "rbind" Recombination

condDiv: Conditioning Variable Division

convert: Convert 'ddo' / 'ddf' Objects

datadr-package: datadr

ddf: Instantiate a Distributed Data Frame ('ddf')

ddf-accessors: Accessor methods for 'ddf' objects

ddo: Instantiate a Distributed Data Object ('ddo')

ddo-ddf-accessors: Accessor Functions

ddo-ddf-attributes: Managing attributes of 'ddo' or 'ddf' objects

digestFileHash: Digest File Hash Function

divide: Divide a Distributed Data Object

divide-internals: Functions used in divide()

drAggregate: Division-Agnostic Aggregation

drBLB: Bag of Little Bootstraps Transformation Method

drFilter: Filter a 'ddo' or 'ddf' Object

drGetGlobals: Get Global Variables and Package Dependencies

drGLM: GLM Transformation Method

drHexbin: HexBin Aggregation for Distributed Data Frames

drJoin: Join Data Sources by Key

drLapply: Apply a function to all key-value pairs of a ddo/ddf object

drLM: LM Transformation Method

drPersist: Persist a Transformed 'ddo' or 'ddf' Object

drQuantile: Sample Quantiles for 'ddf' Objects

drRead.table: Data Input

drSample: Take a Sample of Key-Value Pairs Take a sample of key-value...

drSubset: Subsetting Distributed Data Frames

flatten: "Flatten" a ddf Subset

getCondCuts: Get names of the conditioning variable cuts

hdfsConn: Connect to Data Source on HDFS

kvApply: Apply Function to Key-Value Pair

kvPair: Specify a Key-Value Pair

kvPairs: Specify a Collection of Key-Value Pairs

localDiskConn: Connect to Data Source on Local Disk

localDiskControl: Specify Control Parameters for MapReduce on a Local Disk...

makeExtractable: Take a ddo/ddf HDFS data object and turn it into a mapfile

mrExec: Execute a MapReduce Job

mr-summary-stats: Functions to Compute Summary Statistics in MapReduce

pipe: Pipe data

print.ddo: Print a "ddo" or "ddf" Object

print.kvPair: Print a key-value pair

print.kvValue: Print value of a key-value pair

readHDFStextFile: Experimental HDFS text reader helper function

readTextFileByChunk: Experimental sequential text reader helper function

recombine: Recombine

removeData: Remove Key-Value Pairs from a Data Connection

rhipeControl: Specify Control Parameters for RHIPE Job

rrDiv: Random Replicate Division

setupTransformEnv: Set up transformation environment

splitvars: Extract "Split" Variable(s)

to_ddf: Convert dplyr grouped_df to ddf

updateAttributes: Update Attributes of a 'ddo' or 'ddf' Object

Files in this package

datadr
datadr/tests
datadr/tests/testthat.R
datadr/tests/testthat
datadr/tests/testthat/test-hexbin.R
datadr/tests/testthat/test-summary.R
datadr/tests/testthat/test-join.R
datadr/tests/testthat/test-quantile.R
datadr/tests/testthat/test-globals.R
datadr/tests/testthat/test-kvMemory.R
datadr/tests/testthat/test-dataops.R
datadr/tests/testthat/test-spark.R
datadr/tests/testthat/test-readtext.R
datadr/tests/testthat/test-kvHDFS.R
datadr/tests/testthat/test-kvLocalDisk.R
datadr/NAMESPACE
datadr/NEWS.md
datadr/data
datadr/data/adult.rda
datadr/R
datadr/R/ddo_ddf_kvMemory.R datadr/R/zzz_constants.R datadr/R/bsv.R datadr/R/dataops_join.R datadr/R/mapreduce_kvLocalDisk.R datadr/R/mapreduce_kvHDFS.R datadr/R/agnostic_summary.R datadr/R/conn_spark.R datadr/R/agnostic_hexbin.R datadr/R/recombine_transforms.R datadr/R/divSpec.R datadr/R/ddo_ddf_updateAttrs.R datadr/R/ddo_ddf_kvHDFS.R datadr/R/dataops_subset.R datadr/R/dataops_persist.R datadr/R/dataops_filter.R datadr/R/ddo_ddf_methods.R datadr/R/agnostic_aggregate.R datadr/R/conn_HDFS.R datadr/R/dataops_readTable.R datadr/R/globals.R datadr/R/ddo_addTransform.R datadr/R/ddf_summary_print.R datadr/R/agnostic_quantile.R datadr/R/divide_df.R datadr/R/recombine_combine.R datadr/R/mapreduce_spark.R datadr/R/dplyr.R datadr/R/dataset_census.R datadr/R/divide.R datadr/R/ddo_ddf_kvSpark.R datadr/R/mapreduce_kvMemory.R datadr/R/divSpec_rrDiv.R datadr/R/dataops_lapply.R datadr/R/ddo_ddf_kvLocalDisk.R datadr/R/kvPairs.R datadr/R/recombine.R datadr/R/ddo_ddf_print.R datadr/R/misc.R datadr/R/conn_localDisk.R datadr/R/mapreduce.R datadr/R/divSpec_condDiv.R datadr/R/dataops_sample.R datadr/R/ddo_ddf.R datadr/R/dataops_read.R datadr/R/datadr-package.R datadr/R/conn_memory.R
datadr/README.md
datadr/MD5
datadr/DESCRIPTION
datadr/man
datadr/man/applyTransform.Rd datadr/man/readTextFileByChunk.Rd datadr/man/localDiskControl.Rd datadr/man/drLM.Rd datadr/man/pipe.Rd datadr/man/ddo-ddf-attributes.Rd datadr/man/drFilter.Rd datadr/man/drHexbin.Rd datadr/man/getCondCuts.Rd datadr/man/drGetGlobals.Rd datadr/man/combRbind.Rd datadr/man/combMeanCoef.Rd datadr/man/divide.Rd datadr/man/drBLB.Rd datadr/man/digestFileHash.Rd datadr/man/ddf-accessors.Rd datadr/man/readHDFStextFile.Rd datadr/man/rhipeControl.Rd datadr/man/kvPair.Rd datadr/man/ddo-ddf-accessors.Rd datadr/man/removeData.Rd datadr/man/combDdf.Rd datadr/man/setupTransformEnv.Rd datadr/man/combDdo.Rd datadr/man/convert.Rd datadr/man/mr-summary-stats.Rd datadr/man/divide-internals.Rd datadr/man/condDiv.Rd datadr/man/flatten.Rd datadr/man/drGLM.Rd datadr/man/to_ddf.Rd datadr/man/mrExec.Rd datadr/man/drSubset.Rd datadr/man/ddo.Rd datadr/man/bsv.Rd datadr/man/as.data.frame.ddf.Rd datadr/man/combMean.Rd datadr/man/print.kvPair.Rd datadr/man/kvPairs.Rd datadr/man/ddf.Rd datadr/man/recombine.Rd datadr/man/drPersist.Rd datadr/man/print.ddo.Rd datadr/man/adult.Rd datadr/man/updateAttributes.Rd datadr/man/drLapply.Rd datadr/man/drAggregate.Rd datadr/man/combCollect.Rd datadr/man/splitvars.Rd datadr/man/rrDiv.Rd datadr/man/makeExtractable.Rd datadr/man/drSample.Rd datadr/man/localDiskConn.Rd datadr/man/addData.Rd datadr/man/datadr-package.Rd datadr/man/drJoin.Rd datadr/man/drQuantile.Rd datadr/man/as.list.ddo.Rd datadr/man/charFileHash.Rd datadr/man/drRead.table.Rd datadr/man/kvApply.Rd datadr/man/hdfsConn.Rd datadr/man/addTransform.Rd datadr/man/print.kvValue.Rd
datadr/LICENSE

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.