avworkflow: Workflow submissions and file outputs

View source: R/avworkflow_configuration.R

avworkflowsR Documentation

Workflow submissions and file outputs

Description

avworkflows() returns a tibble summarizing available workflows.

avworkflow_jobs() returns a tibble summarizing submitted workflow jobs for a namespace and name.

avworkflow_files() returns a tibble containing information and file paths to workflow outputs.

avworkflow_localize() creates or synchronizes a local copy of files with files stored in the workspace bucket and produced by the workflow.

avworkflow_run() runs the workflow of the configuration.

avworkflow_stop() stops the most recently submitted workflow jub from running.

Usage

avworkflows(namespace = avworkspace_namespace(), name = avworkspace_name())

avworkflow_jobs(namespace = avworkspace_namespace(), name = avworkspace_name())

avworkflow_files(submissionId = NULL, bucket = avbucket())

avworkflow_localize(
  submissionId = NULL,
  destination = NULL,
  type = c("control", "output", "all"),
  bucket = avbucket(),
  dry = TRUE
)

avworkflow_run(
  config,
  entityName,
  entityType = config$rootEntityType,
  deleteIntermediateOutputFiles = FALSE,
  useCallCache = TRUE,
  namespace = avworkspace_namespace(),
  name = avworkspace_name(),
  dry = TRUE
)

avworkflow_stop(
  submissionId = NULL,
  namespace = avworkspace_namespace(),
  name = avworkspace_name(),
  dry = TRUE
)

Arguments

namespace

character(1) AnVIL workspace namespace as returned by, e.g., avworkspace_namespace()

name

character(1) AnVIL workspace name as returned by, eg., avworkspace_name().

submissionId

a character() of workflow submission ids, or a tibble with column submissionId, or NULL / missing. See 'Details'.

bucket

character(1) name of the google bucket in which the workflow products are available, as gs://.... Usually the bucket of the active workspace, returned by avbucket().

destination

character(1) file path to the location where files will be synchronized. For directories in the current working directory, be sure to prepend with "./". When NULL, the submissionId is used as the destination. destination may also be a google bucket, in which case th workflow files are synchronized from the workspace to a second bucket.

type

character(1) copy "control" (default), "output", or "all" files produced by a workflow.

dry

logical(1) when TRUE (default), report the consequences but do not perform the action requested. When FALSE, perform the action.

config

a avworkflow_configuration object of the workflow that will be run.

entityName

character(1) name of the set of samples to be used when running the workflow.

entityType

character(1) type of root entity used for the workflow.

deleteIntermediateOutputFiles

logical(1) whether or not to delete intermediate output files when the workflow completes.

useCallCache

logical(1) whether or not to read from cache for this submission.

Details

For avworkflow_files(), the submissionId is the identifier associated with the workflow job, and is present in the return value of avworkflow_jobs(); the example illustrates how the first row of avworkflow_jobs() (i.e., the most recenltly completed workflow) can be used as input to avworkflow_files(). When submissionId is not provided, the return value is for the most recently submitted workflow of the namespace and name of avworkspace().

avworkflow_localize(). type = "control" files summarize workflow progress; they can be numerous but are frequently small and quickly syncronized. type = "output" files are the output products of the workflow stored in the workspace bucket. Depending on the workflow, outputs may be large, e.g., aligned reads in bam files. See gsutil_cp() to copy individual files from the bucket to the local drive.

avworkflow_localize() treats submissionId= in the same way as avworkflow_files(): when missing, files from the most recent workflow job are candidates for localization.

Value

avworkflows() returns a tibble. Each workflow is in a 'namespace' and has a 'name', as illustrated in the example. Columns are

  • name: workflow name.

  • namespace: workflow namespace (often the same as the workspace namespace).

  • rootEntityType: name of the avtable() used to retrieve inputs.

  • methodRepoMethod.methodUri: source of the method, e.g., a dockstore URI.

  • methodRepoMethod.sourceRepo: source repository, e.g., dockstore.

  • methodRepoMethod.methodPath: path to method, e.g., a dockerstore method might reference a github repository.

  • methodRepoMethod.methodVersion: the version of the method, e.g., 'main' branch of a github repository.

avworkflow_jobs() returns a tibble, sorted by submissionDate, with columns

  • submissionId character() job identifier from the workflow runner.

  • submitter character() AnVIL user id of individual submitting the job.

  • submissionDate POSIXct() date (in local time zone) of job submission.

  • status character() job status, with values 'Accepted' 'Evaluating' 'Submitting' 'Submitted' 'Aborting' 'Aborted' 'Done'

  • succeeded integer() number of workflows succeeding.

  • failed integer() number of workflows failing.

avworkflow_files() returns a tibble with columns

  • file: character() 'base name' of the file in the bucket.

  • workflow: character() name of the workflow the file is associated with.

  • task: character() name of the task in the workflow that generated the file.

  • path: charcter() full path to the file in the google bucket.

avworkflow_localize() prints a message indicating the number of files that are (if dry = FALSE) or would be localized. If no files require localization (i.e., local files are not older than the bucket files), then no files are localized. avworkflow_localize() returns a tibble of file name and bucket path of files to be synchronized.

avworkflow_run() returns config, invisibly.

avworkflow_stop() returns NULL, invisibly.

Examples

if (gcloud_exists() && nzchar(avworkspace_name()))
    ## from within AnVIL
    avworkflows() %>% select(namespace, name)

if (gcloud_exists() && nzchar(avworkspace_name()))
    ## from within AnVIL
    avworkflow_jobs()

if (gcloud_exists() && nzchar(avworkspace_name())) {
    ## e.g., from within AnVIL
    avworkflow_jobs() %>%
    ## select most recent workflow
    head(1) %>%
    ## find paths to output and log files on the bucket
    avworkflow_files()
}

if (gcloud_exists() && nzchar(avworkspace_name())) {
    avworkflow_localize(dry = TRUE)
}

## Not run: 
entityName <- avtable("participant_set") |>
    pull(participant_set_id) |>
    head(1)
avworkflow_run(new_config, entityName)

## End(Not run)

## Not run: 
avworkflow_stop()

## End(Not run)


Bioconductor/AnVIL documentation built on June 25, 2022, 9:42 p.m.