datadr: Divide and Recombine for Large, Complex Data

Methods for dividing data into subsets, applying analytical methods to the subsets, and recombining the results. Comes with a generic MapReduce interface as well. Works with key-value pairs stored in memory, on local disk, or on HDFS, in the latter case using the R and Hadoop Integrated Programming Environment (RHIPE).

AuthorRyan Hafen [aut, cre], Landon Sego [ctb]
Date of publication2016-10-02 15:51:50
MaintainerRyan Hafen <rhafen@gmail.com>
LicenseBSD_3_clause + file LICENSE
Version0.8.6
http://deltarho.org/docs-datadr

View on CRAN

Man pages

addData: Add Key-Value Pairs to a Data Connection

addTransform: Add a Transformation Function to a Distributed Data Object

adult: "Census Income" Dataset

applyTransform: Apply transformation function(s)

as.data.frame.ddf: Turn 'ddf' Object into Data Frame

as.list.ddo: Turn 'ddo' / 'ddf' Object into a list

bsv: Construct Between Subset Variable (BSV)

charFileHash: Character File Hash Function

combCollect: "Collect" Recombination

combDdf: "DDF" Recombination

combDdo: "DDO" Recombination

combMean: Mean Recombination

combMeanCoef: Mean Coefficient Recombination

combRbind: "rbind" Recombination

condDiv: Conditioning Variable Division

convert: Convert 'ddo' / 'ddf' Objects

datadr-package: datadr

ddf: Instantiate a Distributed Data Frame ('ddf')

ddf-accessors: Accessor methods for 'ddf' objects

ddo: Instantiate a Distributed Data Object ('ddo')

ddo-ddf-accessors: Accessor Functions

ddo-ddf-attributes: Managing attributes of 'ddo' or 'ddf' objects

digestFileHash: Digest File Hash Function

divide: Divide a Distributed Data Object

divide-internals: Functions used in divide()

drAggregate: Division-Agnostic Aggregation

drBLB: Bag of Little Bootstraps Transformation Method

drFilter: Filter a 'ddo' or 'ddf' Object

drGetGlobals: Get Global Variables and Package Dependencies

drGLM: GLM Transformation Method

drHexbin: HexBin Aggregation for Distributed Data Frames

drJoin: Join Data Sources by Key

drLapply: Apply a function to all key-value pairs of a ddo/ddf object

drLM: LM Transformation Method

drPersist: Persist a Transformed 'ddo' or 'ddf' Object

drQuantile: Sample Quantiles for 'ddf' Objects

drRead.table: Data Input

drSample: Take a Sample of Key-Value Pairs Take a sample of key-value...

drSubset: Subsetting Distributed Data Frames

flatten: "Flatten" a ddf Subset

getCondCuts: Get names of the conditioning variable cuts

hdfsConn: Connect to Data Source on HDFS

kvApply: Apply Function to Key-Value Pair

kvPair: Specify a Key-Value Pair

kvPairs: Specify a Collection of Key-Value Pairs

localDiskConn: Connect to Data Source on Local Disk

localDiskControl: Specify Control Parameters for MapReduce on a Local Disk...

makeExtractable: Take a ddo/ddf HDFS data object and turn it into a mapfile

mrExec: Execute a MapReduce Job

mr-summary-stats: Functions to Compute Summary Statistics in MapReduce

pipe: Pipe data

print.ddo: Print a "ddo" or "ddf" Object

print.kvPair: Print a key-value pair

print.kvValue: Print value of a key-value pair

readHDFStextFile: Experimental HDFS text reader helper function

readTextFileByChunk: Experimental sequential text reader helper function

recombine: Recombine

removeData: Remove Key-Value Pairs from a Data Connection

rhipeControl: Specify Control Parameters for RHIPE Job

rrDiv: Random Replicate Division

setupTransformEnv: Set up transformation environment

splitvars: Extract "Split" Variable(s)

to_ddf: Convert dplyr grouped_df to ddf

updateAttributes: Update Attributes of a 'ddo' or 'ddf' Object

Functions

\%>\% Man page
addData Man page
addSplitAttrs Man page
addTransform Man page
adult Man page
applyTransform Man page
as.data.frame.ddf Man page
as.list.ddo Man page
bsv Man page
bsvInfo Man page
calculateMoments Man page
charFileHash Man page
combCollect Man page
combDdf Man page
combDdo Man page
combineMoments Man page
combineMultipleMoments Man page
combMean Man page
combMeanCoef Man page
combRbind Man page
condDiv Man page
convert Man page
counters Man page
datadr Man page
datadr-package Man page
ddf Man page
ddf-accessors Man page
ddo Man page
ddo-ddf-accessors Man page
ddo-ddf-attributes Man page
dfSplit Man page
digestFileHash Man page
divide Man page
divide-internals Man page
drAggregate Man page
drBLB Man page
drFilter Man page
drGetGlobals Man page
drGLM Man page
drHexbin Man page
drJoin Man page
drLapply Man page
drLM Man page
drPersist Man page
drQuantile Man page
drRead.csv Man page
drRead.csv2 Man page
drRead.delim Man page
drRead.delim2 Man page
drRead.table Man page
drSample Man page
drSubset Man page
flatten Man page
getAttribute Man page
getAttributes Man page
getAttributes.ddf Man page
getAttributes.ddo Man page
getBsv Man page
getBsvs Man page
getCondCuts Man page
getKeys Man page
getSplitVar Man page
getSplitVars Man page
hasAttributes Man page
hasAttributes.ddf Man page
hasExtractableKV Man page
hdfsConn Man page
kvApply Man page
kvExample Man page
kvPair Man page
kvPairs Man page
length.ddo Man page
localDiskConn Man page
localDiskControl Man page
makeExtractable Man page
moments2statistics Man page
mrExec Man page
mr-summary-stats Man page
names.ddf Man page
ncol Man page
NCOL Man page
ncol,ddf-method Man page
NCOL,ddf-method Man page
nrow Man page
NROW Man page
nrow,ddf-method Man page
NROW,ddf-method Man page
print.ddo Man page
print.kvPair Man page
print.kvValue Man page
readHDFStextFile Man page
readTextFileByChunk Man page
recombine Man page
removeData Man page
rhipeControl Man page
rrDiv Man page
setAttributes Man page
setAttributes.ddf Man page
setAttributes.ddo Man page
setupTransformEnv Man page
splitRowDistn Man page
splitSizeDistn Man page
summary.ddf Man page
summary.ddo Man page
tabulateMap Man page
tabulateReduce Man page
to_ddf Man page
updateAttributes Man page

Files

datadr
datadr/tests
datadr/tests/testthat.R
datadr/tests/testthat
datadr/tests/testthat/test-hexbin.R
datadr/tests/testthat/test-summary.R
datadr/tests/testthat/test-join.R
datadr/tests/testthat/test-quantile.R
datadr/tests/testthat/test-globals.R
datadr/tests/testthat/test-kvMemory.R
datadr/tests/testthat/test-dataops.R
datadr/tests/testthat/test-spark.R
datadr/tests/testthat/test-readtext.R
datadr/tests/testthat/test-kvHDFS.R
datadr/tests/testthat/test-kvLocalDisk.R
datadr/NAMESPACE
datadr/NEWS.md
datadr/data
datadr/data/adult.rda
datadr/R
datadr/R/ddo_ddf_kvMemory.R datadr/R/zzz_constants.R datadr/R/bsv.R datadr/R/dataops_join.R datadr/R/mapreduce_kvLocalDisk.R datadr/R/mapreduce_kvHDFS.R datadr/R/agnostic_summary.R datadr/R/conn_spark.R datadr/R/agnostic_hexbin.R datadr/R/recombine_transforms.R datadr/R/divSpec.R datadr/R/ddo_ddf_updateAttrs.R datadr/R/ddo_ddf_kvHDFS.R datadr/R/dataops_subset.R datadr/R/dataops_persist.R datadr/R/dataops_filter.R datadr/R/ddo_ddf_methods.R datadr/R/agnostic_aggregate.R datadr/R/conn_HDFS.R datadr/R/dataops_readTable.R datadr/R/globals.R datadr/R/ddo_addTransform.R datadr/R/ddf_summary_print.R datadr/R/agnostic_quantile.R datadr/R/divide_df.R datadr/R/recombine_combine.R datadr/R/mapreduce_spark.R datadr/R/dplyr.R datadr/R/dataset_census.R datadr/R/divide.R datadr/R/ddo_ddf_kvSpark.R datadr/R/mapreduce_kvMemory.R datadr/R/divSpec_rrDiv.R datadr/R/dataops_lapply.R datadr/R/ddo_ddf_kvLocalDisk.R datadr/R/kvPairs.R datadr/R/recombine.R datadr/R/ddo_ddf_print.R datadr/R/misc.R datadr/R/conn_localDisk.R datadr/R/mapreduce.R datadr/R/divSpec_condDiv.R datadr/R/dataops_sample.R datadr/R/ddo_ddf.R datadr/R/dataops_read.R datadr/R/datadr-package.R datadr/R/conn_memory.R
datadr/README.md
datadr/MD5
datadr/DESCRIPTION
datadr/man
datadr/man/applyTransform.Rd datadr/man/readTextFileByChunk.Rd datadr/man/localDiskControl.Rd datadr/man/drLM.Rd datadr/man/pipe.Rd datadr/man/ddo-ddf-attributes.Rd datadr/man/drFilter.Rd datadr/man/drHexbin.Rd datadr/man/getCondCuts.Rd datadr/man/drGetGlobals.Rd datadr/man/combRbind.Rd datadr/man/combMeanCoef.Rd datadr/man/divide.Rd datadr/man/drBLB.Rd datadr/man/digestFileHash.Rd datadr/man/ddf-accessors.Rd datadr/man/readHDFStextFile.Rd datadr/man/rhipeControl.Rd datadr/man/kvPair.Rd datadr/man/ddo-ddf-accessors.Rd datadr/man/removeData.Rd datadr/man/combDdf.Rd datadr/man/setupTransformEnv.Rd datadr/man/combDdo.Rd datadr/man/convert.Rd datadr/man/mr-summary-stats.Rd datadr/man/divide-internals.Rd datadr/man/condDiv.Rd datadr/man/flatten.Rd datadr/man/drGLM.Rd datadr/man/to_ddf.Rd datadr/man/mrExec.Rd datadr/man/drSubset.Rd datadr/man/ddo.Rd datadr/man/bsv.Rd datadr/man/as.data.frame.ddf.Rd datadr/man/combMean.Rd datadr/man/print.kvPair.Rd datadr/man/kvPairs.Rd datadr/man/ddf.Rd datadr/man/recombine.Rd datadr/man/drPersist.Rd datadr/man/print.ddo.Rd datadr/man/adult.Rd datadr/man/updateAttributes.Rd datadr/man/drLapply.Rd datadr/man/drAggregate.Rd datadr/man/combCollect.Rd datadr/man/splitvars.Rd datadr/man/rrDiv.Rd datadr/man/makeExtractable.Rd datadr/man/drSample.Rd datadr/man/localDiskConn.Rd datadr/man/addData.Rd datadr/man/datadr-package.Rd datadr/man/drJoin.Rd datadr/man/drQuantile.Rd datadr/man/as.list.ddo.Rd datadr/man/charFileHash.Rd datadr/man/drRead.table.Rd datadr/man/kvApply.Rd datadr/man/hdfsConn.Rd datadr/man/addTransform.Rd datadr/man/print.kvValue.Rd
datadr/LICENSE

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.