mantaReduce: Constructor for R format Manta Job for Reduce Unix task.

Description Usage Arguments Details Value See Also Examples

View source: R/mantaReduce.R

Description

Helper function to construct R structure describing a Reduce task. To be used to satisfy the ... argument of mantaJob.setup and specify the Unix command line task, any initialization tasks, an array of Manta filesystem asset files, and the memory/disk size to be used for the compute instance on Manta.

Usage

1
mantaReduce(exec, init, assets, reducers, memory, disk)

Arguments

reducers

integer. Number of reducers to use from 1 to 1024. Use with caution.

exec

character required. The Unix shell command to be executed in the Map task operating on the input Manta objects specified when the job is launched. exec may be any valid Unix shell command capable of running on the Manta compute node at execution time. Use the Node.js command mlogin to test out commands. Pipelines and shell syntax escaping and substitution are all supported. You can also execute your own programs stored as Manta objects by including them with the assets parameter and referencing them from the exec command from the /assets folder.
See:
http://apidocs.joyent.com/manta/jobs-reference.html
for more details.

init

character optional. A Unix shell command executed prior to the exec command. Used to run initialization steps on the Manta compute node prior to task execution. init can also execute programs stored as Manta objects mounted as POSIX read-only files mounted at /assets. For example it can unpack a tar asset before running exec.

assets

array of character, optional. Specify Manta objects that are to be accessed by the compute node at job runtime. Include shell scripts, installation steps configuration steps, custom executables compiled for SmartOS, or tar files as you require here. At job runtime, each node will provide the specified Manta objects as POSIX files at the /assets directory for read-only access from your exec or init shell commands. For example a Manta object listed as an asset that lives at ~~/stor/data.tgz will be found by your script on the Manta compute node as a mounted read-only POSIX file at /assets/~~/stor/data.tgz where ~~ is your Manta username.

memory

integer optional. Amount of memory requested for Manta compute node instance. 128, 256, 512, 1024, 2048, 8192, or 16384 are valid values in MB. Default is 1024 MB.

disk

integer optional. Amount of temporary working disk (not Manta storage space) to be used by the compute node when executing the task. Valid values are: 2, 4, 8, 16, 32, 64, 128, 256, 512, or 1024 GB. Default is 8 GB. Writeable disk on each compute node is found at the /var/tmp directory during init or exec job runtime. To save data from this space onto permanent Manta storage, use the Node.js command mput in your exec script to upload the files from /var/tmp onto Manta storage.

Details

On Manta, a Reduce task phase executes a generic UNIX command when specified in mantaJob.launch, or from the output pipelined from a previous step. Use mantaReduce to run a job that has no Manta object input data.

The exec argument must be a valid generic UNIX command line, not an R function. The exec argument may call executables or runtime language scripts that are hosted on Manta and specified as assets. The init parameter command is called before the exec argument and is not passed input. The init argument may be used, for example to extract scripts from an asset on Manta saved as a tar object.

Note that you do not have to specify the input for a Reduce task for mantaJob.launch, the service pipes the output of the previous Map task phase as input to the Reduce task. Note also tha the piped input for a Reduce task may arrive in any order, no sorting is done by the service to the pipe between Map and Reduce tasks.

Value

Returns an R list for consumption by mantaJob.setup

See Also

Other mantaJobs: mantaJob.cancel; mantaJob.done; mantaJob.errors.stderr; mantaJob.errors; mantaJob.failures; mantaJob.inputs; mantaJob.launch; mantaJob.outputs.cat; mantaJob.outputs; mantaJob.setup; mantaJob.status; mantaJobs.running; mantaJobs.tail; mantaJobs; mantaMap

Examples

1
2
3
4
5
6
7
## Not run: 
# Example - Map/Reduce Unix Word Count
job <- mantaJob.setup("word count",
         mantaMap("wc"),
         mantaReduce("awk '{ l += $1; w += $2; c += $3 } END { print l, w, c }'"))

## End(Not run)

joyent/mantaRSDK documentation built on May 19, 2019, 10:43 p.m.