mantaMap: Constructor for R format Manta Job for Map Unix task.

Description Usage Arguments Details Value See Also Examples

View source: R/mantaMap.R

Description

Helper function to construct R structure describing a Map task. To be used to satisfy the ... argument of mantaJob.setup and specify the Unix command line task, any initialization tasks, an array of Manta asset objects, and the memory/disk size to be used for the compute instance on Manta.

Usage

1
mantaMap(exec, init, assets, memory, disk)

Arguments

exec

character required. The Unix shell command to be executed in the Map task operating on the input Manta objects specified when the job is launched. exec may be any valid Unix shell command capable of running on the Manta compute node at execution time. Use the Node.js command mlogin to test out commands. Pipelines and shell syntax escaping and substitution are all supported. You can also execute your own programs stored as Manta objects by including them with the assets parameter and referencing them from the exec command from the /assets folder.
See:
http://apidocs.joyent.com/manta/jobs-reference.html
for more details.

init

character optional. A Unix shell command executed prior to the exec command. Used to run initialization steps on the Manta compute node prior to task execution. init can also execute programs stored as Manta objects mounted as POSIX read-only files mounted at /assets. For example it can unpack a tar asset before running exec.

assets

array of character, optional. Specify Manta objects that are to be accessed by the compute node at job runtime. Include shell scripts, installation steps configuration steps, custom executables compiled for SmartOS, or tar files as you require here. At job runtime, each node will provide the specified Manta objects as POSIX files at the /assets directory for read-only access from your exec or init shell commands. For example a Manta object listed as an asset that lives at ~~/stor/data.tgz will be found by your script on the Manta compute node as a mounted read-only POSIX file at /assets/~~/stor/data.tgz where ~~ is your Manta username.

memory

integer optional. Amount of memory requested for Manta compute node instance. 128, 256, 512, 1024, 2048, 8192, or 16384 are valid values in MB. Default is 1024 MB.

disk

integer optional. Amount of temporary working disk (not Manta storage space) to be used by the compute node when executing the task. Valid values are: 2, 4, 8, 16, 32, 64, 128, 256, 512, or 1024 GB. Default is 8 GB. Writeable disk on each compute node is found at the /var/tmp directory during init or exec job runtime. To save data from this space onto permanent Manta storage, use the Node.js command mput in your exec script to upload the files from /var/tmp onto Manta storage.

Details

On Manta, a Map task phase executes a generic UNIX command given some input Manta object list which is specified in mantaJob.launch, which distributes the jobs to compute instances local to the Manta object location. The exec argument must be a valid generic UNIX command line, not an R function. The exec argument may call executables or runtime language scripts that are hosted on Manta and specified as assets. The init argument is called before the exec argument and is not passed input. The init argument may be used, for example to extract scripts from an asset on Manta saved as a tar file.

Value

Returns an R list describing a Map task phase for consumption by mantaJob.setup

See Also

Other mantaJobs: mantaJob.cancel; mantaJob.done; mantaJob.errors.stderr; mantaJob.errors; mantaJob.failures; mantaJob.inputs; mantaJob.launch; mantaJob.outputs.cat; mantaJob.outputs; mantaJob.setup; mantaJob.status; mantaJobs.running; mantaJobs.tail; mantaJobs; mantaReduce

Examples

1
2
3
4
5
6
7
## Not run: 
# Example - Map/Reduce Unix Word Count
job <- mantaJob.setup("word count",
         mantaMap("wc"),
         mantaReduce("awk '{ l += $1; w += $2; c += $3 } END { print l, w, c }'"))

## End(Not run)

joyent/mantaRSDK documentation built on May 19, 2019, 10:43 p.m.