avworkflow: Workflow submissions and file outputs

View source: R/avworkflow_configuration.R

avworkflowsR Documentation

Workflow submissions and file outputs

Description

avworkflows() returns a tibble summarizing available workflows.

avworkflow_jobs() returns a tibble summarizing submitted workflow jobs for a namespace and name.

avworkflow_files() returns a tibble containing information and file paths to workflow outputs.

avworkflow_localize() creates or synchronizes a local copy of files with files stored in the workspace bucket and produced by the workflow.

avworkflow_run() runs the workflow of the configuration.

avworkflow_stop() stops the most recently submitted workflow jub from running.

Usage

avworkflows(namespace = avworkspace_namespace(), name = avworkspace_name())

avworkflow_jobs(namespace = avworkspace_namespace(), name = avworkspace_name())

avworkflow_files(
  submissionId = NULL,
  bucket = avbucket(),
  namespace = avworkspace_namespace(),
  name = avworkspace_name()
)

avworkflow_localize(
  submissionId = NULL,
  destination = NULL,
  type = c("control", "output", "all"),
  bucket = avbucket(),
  dry = TRUE
)

avworkflow_run(
  config,
  entityName,
  entityType = config$rootEntityType,
  deleteIntermediateOutputFiles = FALSE,
  useCallCache = TRUE,
  useReferenceDisks = FALSE,
  namespace = avworkspace_namespace(),
  name = avworkspace_name(),
  dry = TRUE
)

avworkflow_stop(
  submissionId = NULL,
  namespace = avworkspace_namespace(),
  name = avworkspace_name(),
  dry = TRUE
)

Arguments

namespace

character(1) AnVIL workspace namespace as returned by, e.g., avworkspace_namespace()

name

character(1) AnVIL workspace name as returned by, eg., avworkspace_name().

submissionId

a character() of workflow submission ids, or a tibble with column submissionId, or NULL / missing. See 'Details'.

bucket

character(1) DEPRECATED (ignored in the current release) name of the google bucket in which the workflow products are available, as ⁠gs://...⁠. Usually the bucket of the active workspace, returned by avbucket().

destination

character(1) file path to the location where files will be synchronized. For directories in the current working directory, be sure to prepend with "./". When NULL, the submissionId is used as the destination. destination may also be a google bucket, in which case th workflow files are synchronized from the workspace to a second bucket.

type

character(1) copy "control" (default), "output", or "all" files produced by a workflow.

dry

logical(1) when TRUE (default), report the consequences but do not perform the action requested. When FALSE, perform the action.

config

a avworkflow_configuration object of the workflow that will be run. Only entityType and method configuration name and namespace are used from config; other configuration values must be communicated to AnVIL using avworkflow_configuration_set().

entityName

character(1) or NULL name of the set of samples to be used when running the workflow. NULL indicates that no sample set will be used.

entityType

character(1) or NULL type of root entity used for the workflow. NULL means that no root entity will be used.

deleteIntermediateOutputFiles

logical(1) whether or not to delete intermediate output files when the workflow completes.

useCallCache

logical(1) whether or not to read from cache for this submission.

useReferenceDisks

logical(1) whether or not to use pre-built disks for common genome references. Default: FALSE.

Details

For avworkflow_files(), the submissionId is the identifier associated with the submission of one (or more) workflows, and is present in the return value of avworkflow_jobs(); the example illustrates how the first row of avworkflow_jobs() (i.e., the most recently completed workflow) can be used as input to avworkflow_files(). When submissionId is not provided, the return value is for the most recently submitted workflow of the namespace and name of avworkspace().

avworkflow_localize(). type = "control" files summarize workflow progress; they can be numerous but are frequently small and quickly syncronized. type = "output" files are the output products of the workflow stored in the workspace bucket. Depending on the workflow, outputs may be large, e.g., aligned reads in bam files. See gsutil_cp() to copy individual files from the bucket to the local drive.

avworkflow_localize() treats ⁠submissionId=⁠ in the same way as avworkflow_files(): when missing, files from the most recent workflow job are candidates for localization.

Value

avworkflows() returns a tibble. Each workflow is in a 'namespace' and has a 'name', as illustrated in the example. Columns are

  • name: workflow name.

  • namespace: workflow namespace (often the same as the workspace namespace).

  • rootEntityType: name of the avtable() used to retrieve inputs.

  • methodRepoMethod.methodUri: source of the method, e.g., a dockstore URI.

  • methodRepoMethod.sourceRepo: source repository, e.g., dockstore.

  • methodRepoMethod.methodPath: path to method, e.g., a dockerstore method might reference a github repository.

  • methodRepoMethod.methodVersion: the version of the method, e.g., 'main' branch of a github repository.

avworkflow_jobs() returns a tibble, sorted by submissionDate, with columns

  • submissionId character() job identifier from the workflow runner.

  • submitter character() AnVIL user id of individual submitting the job.

  • submissionDate POSIXct() date (in local time zone) of job submission.

  • status character() job status, with values 'Accepted' 'Evaluating' 'Submitting' 'Submitted' 'Aborting' 'Aborted' 'Done'

  • succeeded integer() number of workflows succeeding.

  • failed integer() number of workflows failing.

avworkflow_files() returns a tibble with columns

  • file: character() 'base name' of the file in the bucket.

  • workflow: character() name of the workflow the file is associated with.

  • task: character() name of the task in the workflow that generated the file.

  • path: charcter() full path to the file in the google bucket.

  • submissionId: character() internal identifier associated with the submission the files belong to.

  • workflowId: character() internal identifer associated with each workflow (e.g., row of an avtable() used as input) in the submission.

  • submissionRoot: character() path in the workspace bucket to the root of files created by this submission.

  • namespace: character() AnVIL workspace namespace (billing account) associated with the submissionId.

  • name: character(1) AnVIL workspace name associated with the submissionId.

avworkflow_localize() prints a message indicating the number of files that are (if dry = FALSE) or would be localized. If no files require localization (i.e., local files are not older than the bucket files), then no files are localized. avworkflow_localize() returns a tibble of file name and bucket path of files to be synchronized.

avworkflow_run() returns config, invisibly.

avworkflow_stop() returns (invisibly) TRUE on successfully requesting that the workflow stop, FALSE if the workflow is already aborting, aborted, or done.

Examples

if (gcloud_exists() && nzchar(avworkspace_name()))
    ## from within AnVIL
    avworkflows() %>% select(namespace, name)

if (gcloud_exists() && nzchar(avworkspace_name()))
    ## from within AnVIL
    avworkflow_jobs()

if (gcloud_exists() && nzchar(avworkspace_name())) {
    ## e.g., from within AnVIL
    avworkflow_jobs() |>
    ## select most recent workflow
    head(1) |>
    ## find paths to output and log files on the bucket
    avworkflow_files()
}

if (gcloud_exists() && nzchar(avworkspace_name())) {
    avworkflow_localize(dry = TRUE)
}

## Not run: 
entityName <- avtable("participant_set") |>
    pull(participant_set_id) |>
    head(1)
avworkflow_run(new_config, entityName)

## End(Not run)

## Not run: 
avworkflow_stop()

## End(Not run)


Bioconductor/AnVIL documentation built on Sept. 15, 2023, 5:47 a.m.