rhipeControl: Specify Control Parameters for RHIPE Job

Description Usage Arguments Examples

Description

Specify control parameters for a RHIPE job. See rhwatch for details about each of the parameters.

Usage

1
2
3
rhipeControl(mapred = NULL, setup = NULL, combiner = FALSE,
  cleanup = NULL, orderby = "bytes", shared = NULL, jarfiles = NULL,
  zips = NULL, jobname = "")

Arguments

mapred, setup, combiner, cleanup, orderby, shared, jarfiles, zips, jobname

arguments to rhwatch in RHIPE

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
## Not run: 
# input data on HDFS
d <- ddf(hdfsConn("/path/to/big/data/on/hdfs"))

# set RHIPE / Hadoop parameters
# buffer sizes control how many k/v pairs are sent to map / reduce tasks at a time
# mapred.reduce.tasks is a Hadoop config parameter that controls # of reduce tasks
rhctl <- rhipeControl(mapred = list(
  rhipe_map_buff_size = 10000,
  mapred.reduce.tasks = 72,
  rhipe_reduce_buff_size = 1)

# divide input data using these control parameters
divide(d, by = "var", output = hdfsConn("/path/to/output"), control = rhctl)

## End(Not run)

datadr documentation built on May 1, 2019, 8:06 p.m.