Executing analytical applications

Description

Functions for executing and managing analytical applications deployed in the iPlant infrastructure

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SubmitJob(application, file.path="", file.list=NULL, input.list, 
          args.list=NULL, job.name, nprocs=1, private.APP=FALSE, 
          suppress.Warnings=FALSE,  shared.username=NULL,
          print.curl=FALSE)
Wait(job.id, minWaitsec, maxWaitsec, print=FALSE)
CheckJobStatus(job.id, history = FALSE, print.curl = FALSE)
KillJob(job.id, print.curl=FALSE)
ListJobOutput(job.id, print.curl=FALSE, print.total=TRUE)
RetrieveJob(job.id, file.vec=NULL, print.curl=FALSE, verbose=FALSE)
GetJobHistory(return.json=FALSE, print.curl=FALSE)
DeleteJob(job.id, print.curl=FALSE, ALL=FALSE) 

Arguments

application

Name of DE application. Use the ListApps() function for a list of eligible applications. To run your own private application use private.APP =TRUE and suppress.Warnings=TRUE.

file.path

Optional path to a user's subdirectory on the DE; the default path is empty, which leads to the home directory.

file.list

A list of input files, many functions only have one input file, but some have multiple input files. These should be organized as a list. The file.list and input.list should correspond. See details for more information.

job.name

The name to give the job being submitted.

nprocs

The number of processors to be allocated to the job, default = 1.

private.APP

Optional argument for submitting a job on your own private application, default is FALSE

job.id

The unique ID number given to a submitted job.

input.list

A list of type of input that is specific to the application. See details for more information.

args.list

A list of input options available for the application. These are usually the flagging options in command line invocations. See details for more information.

return.json

Optional screen output that displays all of the results from the api, default = FALSE.

file.vec

Names of output files to download, can be one or many. If left NULL, all the files in the job output will download.

minWaitsec

A range of times (in seconds) must be entered for the Wait function. This entry is the minimum time (in seconds) of that range.

maxWaitsec

A range of times (in seconds) must be entered for the Wait function. This entry is the maximum time (in seconds) of that range.

print.curl

Prints the curl statement that can be used in the terminal, if curl is installed on your computer.

print.total

Option only for the ListJobOuput function this option will print the total number of files in the folder.

print

Only for the Wait function, when print=TRUE, it simply prints the status when the job is complete.

verbose

For the RetrieveJob function this option will print the names of the files as they are downloaded.

shared.username

With iPlant you have the ability to share folders with other users. If someone has shared a folder with you and you want to run a job with them, enter their username for this input.

suppress.Warnings

This will turn off the warnings, will speed up run time. Use with caution, if the inputs are incorrect they will not be caught. If the application you are running is a private application have suppress.Warnings=TRUE.

ALL

This option is only on the DeleteJob function. If ALL=TRUE then all jobs in the job history will be deleted.

history

This option is only on the CheckJobStatus function. If TRUE, then will show entire history of job.

Details

The function SubmitJob, takes inputs and arguments and submits a job on the Agave API. The SubmitJob function will run the application with the file inputs file.list that are in the directory file.path. The files within file.list need to match the expected file types for the application (defined in input.list argument). The appropriate options for the application need to be outlined in input.list and potentially args.list. The SubmitJob function outputs the job.id and the job name. With that job.id you can run CheckJobStatus(job.id) to check the status of your job, and the job name can be used in workflows. The stages for CheckJobStatus are:

PENDING
STAGING_INPUTS
CLEANING_UP
ARCHIVING
STAGING_JOB
FINISHED
KILLED
FAILED
STOPPED
RUNNING
PAUSED
QUEUED
SUBMITTING
STAGED
PROCESSING_INPUTS
ARCHIVING_FINISHED
ARCHIVING_FAILED

When it is finished it will read either ARCHIVING_FINISHED or FINISHED, unless it failed. Use the KillJob function to terminate a running job. Use the Wait function to wait until job is finished. Be cautious using the Wait function, because it will lock up the workspace until the job is finished. When the job is finished then use the ListJobOutput function to see all of the files in your job. The number of output files varies by application. The RetrieveJob function takes the job.id and the file.vec as input, and downloads the specified files in the file.vec. The files will be downloaded to your current working directory (getwd()). The file.vec contains the file names that you want to download. This vector is a subset of the output from ListJobOutput. The DeleteJob function then deletes the job and the correponding output folder that was generated from running the job. Using the option DeleteJob(ALL=TRUE) will delete all jobs in a user's job history. The GetJobHistory function displays all jobs in your history that have not been deleted.

For the SubmitJob function the application must match an application name that is in the output from the ListApps function. For the input.list use the GetAppInfo function, the 'kind' column verifies if "input" or "output". What goes in the input.list is only the name in the 'id' column when the 'kind' column is "input". For example, when the application is "muscle-lonestar-3.8.31u2", we can use GetAppInfo("muscle-lonestar-3.8.31u2")$Information to determine that the application is expecting "stdin" as its first input file (input.list=list("stdin")). For the application "velveth-1.2.07u1", GetAppInfo("velveth-1.2.07u1")$Information, tells us that the application will expect six input files, which should be in the order: input.list=list("reads1", "reads2", "reads3", "reads4", "reads5", "reads6").

A few things to note: 1) depending on the application, the input.list can be shorter than the the number of inputs, for example, using the "velveth-1.2.07u1" application, the input list could be input.list=list("reads1", "reads2", "reads3"); 2) the file.list should always be the same length as input.list; 3) for args.list use GetAppInfo function, when the 'kind' column is 'parameters', those are the inputs for args.list. For velveth-1.2.07u1 the args.list is as follows, list(c("format1", value), c("kmer", value), c("Output", value)). The list can be as long as the number of options.

Value

A list containing the job id and the job name is provided for jobs submitted. If an error, then a message stating the error should also be reported.

See Also

ListApps, Validate, UploadFile

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
## Not run: data(DNA.fasta)
## Not run: write.fasta(sequences = DNA.fasta, names = names(DNA.fasta), file.out = "DNA.fasta")
## Not run: Validate("username","password")
## Not run: UploadFile("DNA.fasta", filetype="FASTA-0")

# Submit a MUSCLE job using the provided data in the package.  The job will return
# a job id and job name
## Not run: myJob <- SubmitJob(application="Muscle-3.8.32u4", file.list=list("DNA.fasta"),
                            input.list=list("stdin"), args.list=list(c("arguments", 
                            "-phyiout")), job.name="muscleDNA")
## End(Not run)

# Check the status of any job
## Not run: CheckJobStatus(myJob$id)
             
# Lists and output files a job has created
## Not run: ListJobOutput(myJob$id)

# Might want to kill job if incorrect running
## Not run: KillJob(myJob$id)
# Need to wait for job to be done 
## Not run: Wait(myJob$id, 5, 1800, print=TRUE)
 
# Download output files
## Not run: RetrieveJob(myJob$id, ListJobOutput(myJob$id, print.total=FALSE))
     
# View job history
## Not run: GetJobHistory()

# Delete Job
## Not run: DeleteJob(myJob$id)
## Not run: DeleteJob(ALL=TRUE)