readHDFStextFile: Experimental HDFS text reader helper function

Description Usage Arguments Examples

Description

Experimental helper function for reading text data on HDFS into a HDFS connection

Usage

1
2
readHDFStextFile(input, output = NULL, overwrite = FALSE, fn = NULL,
  keyFn = NULL, linesPerBlock = 10000, control = NULL, update = FALSE)

Arguments

input

a ddo / ddf connection to a text input directory on HDFS, created with hdfsConn - ensure the text files are within a directory and that type = "text" is specified

output

an output connection such as those created with localDiskConn, and hdfsConn

overwrite

logical; should existing output location be overwritten? (also can specify overwrite = "backup" to move the existing output to _bak)

fn

function to be applied to each chunk of lines (input to function is a vector of strings)

keyFn

optional function to determine the value of the key for each block

linesPerBlock

how many lines at a time to read

control

parameters specifying how the backend should handle things (most-likely parameters to rhwatch in RHIPE) - see rhipeControl and localDiskControl

update

should a MapReduce job be run to obtain additional attributes for the result data prior to returning?

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## Not run: 
res <- readHDFStextFile(
  input = Rhipe::rhfmt("/path/to/input/text", type = "text"),
  output = hdfsConn("/path/to/output"),
  fn = function(x) {
    read.csv(textConnection(paste(x, collapse = "\n")), header = FALSE)
  }
)

## End(Not run)

datadr documentation built on May 1, 2019, 8:06 p.m.