avworkflows-defunct: DEFUNCT - Workflow submissions and file outputs
In Bioconductor/AnVIL: Bioconductor on the AnVIL compute environment

avworkflows-defunct

R Documentation

DEFUNCT - Workflow submissions and file outputs

Description

avworkflows() returns a tibble summarizing available workflows.

avworkflow_files() returns a tibble containing information and file paths to workflow outputs.

avworkflow_localize() creates or synchronizes a local copy of files with files stored in the workspace bucket and produced by the workflow.

avworkflow_run() submits and runs the workflow of the configuration.

avworkflow_stop() stops the most recently submitted workflow jub from running.

avworkflow_info() returns a tibble containing workflow information, including workflowName, status, start and end time, inputs and outputs.

Usage

avworkflows(namespace = avworkspace_namespace(), name = avworkspace_name())

avworkflow_files(
  submissionId = NULL,
  workflowId = NULL,
  bucket = avbucket(),
  namespace = avworkspace_namespace(),
  name = avworkspace_name()
)

avworkflow_localize(
  submissionId = NULL,
  workflowId = NULL,
  destination = NULL,
  type = c("control", "output", "all"),
  bucket = avbucket(),
  dry = TRUE
)

avworkflow_run(
  config,
  entityName,
  entityType = config$rootEntityType,
  deleteIntermediateOutputFiles = FALSE,
  useCallCache = TRUE,
  useReferenceDisks = FALSE,
  namespace = avworkspace_namespace(),
  name = avworkspace_name(),
  dry = TRUE
)

avworkflow_stop(
  submissionId = NULL,
  namespace = avworkspace_namespace(),
  name = avworkspace_name(),
  dry = TRUE
)

avworkflow_info(
  submissionId = NULL,
  namespace = avworkspace_namespace(),
  name = avworkspace_name()
)

Arguments

`namespace`	character(1) AnVIL workspace namespace as returned by, e.g., `avworkspace_namespace()`
`name`	character(1) AnVIL workspace name as returned by, eg., `avworkspace_name()`.
`submissionId`	a character() of workflow submission ids, or a tibble with column `submissionId`, or NULL / missing. See 'Details'.
`workflowId`	a character(1) of internal identifier associated with one workflow in the submission, or NULL / missing.
`bucket`	character(1) DEPRECATED (ignored in the current release) name of the google bucket in which the workflow products are available, as `⁠gs://...⁠`. Usually the bucket of the active workspace, returned by `avbucket()`.
`destination`	character(1) file path to the location where files will be synchronized. For directories in the current working directory, be sure to prepend with `"./"`. When `NULL`, the `submissionId` is used as the destination. `destination` may also be a google bucket, in which case th workflow files are synchronized from the workspace to a second bucket.
`type`	character(1) copy `"control"` (default), `"output"`, or `"all"` files produced by a workflow.
`dry`	logical(1) when `TRUE` (default), report the consequences but do not perform the action requested. When `FALSE`, perform the action.
`config`	a `avworkflow_configuration` object of the workflow that will be run. Only `entityType` and method configuration name and namespace are used from `config`; other configuration values must be communicated to AnVIL using `avworkflow_configuration_set()`.
`entityName`	character(1) or NULL name of the set of samples to be used when running the workflow. NULL indicates that no sample set will be used.
`entityType`	character(1) or NULL type of root entity used for the workflow. NULL means that no root entity will be used.
`deleteIntermediateOutputFiles`	logical(1) whether or not to delete intermediate output files when the workflow completes.
`useCallCache`	logical(1) whether or not to read from cache for this submission.
`useReferenceDisks`	logical(1) whether or not to use pre-built disks for common genome references. Default: `FALSE`.

Details

For avworkflow_files(), the submissionId is the identifier associated with the submission of one (or more) workflows, and is present in the return value of avworkflow_jobs(); the example illustrates how the first row of avworkflow_jobs() (i.e., the most recently completed workflow) can be used as input to avworkflow_files(). When submissionId is not provided, the return value is for the most recently submitted workflow of the namespace and name of avworkspace().

avworkflow_localize(). type = "control" files summarize workflow progress; they can be numerous but are frequently small and quickly syncronized. type = "output" files are the output products of the workflow stored in the workspace bucket. Depending on the workflow, outputs may be large, e.g., aligned reads in bam files. See gsutil_cp() to copy individual files from the bucket to the local drive.

avworkflow_localize() treats ⁠submissionId=⁠ in the same way as avworkflow_files(): when missing, files from the most recent workflow job are candidates for localization.

avworkflow_run() invisibly returns a slightly modified config object. The new config object has an added LastSubmissionId value that identifies the submitted job.

Value

avworkflows() returns a tibble. Each workflow is in a 'namespace' and has a 'name', as illustrated in the example. Columns are

name: workflow name.
namespace: workflow namespace (often the same as the workspace namespace).
rootEntityType: name of the avtable() used to retrieve inputs.
methodRepoMethod.methodUri: source of the method, e.g., a dockstore URI.
methodRepoMethod.sourceRepo: source repository, e.g., dockstore.
methodRepoMethod.methodPath: path to method, e.g., a dockerstore method might reference a github repository.
methodRepoMethod.methodVersion: the version of the method, e.g., 'main' branch of a github repository.

avworkflow_files() returns a tibble with columns

file: character() 'base name' of the file in the bucket.
workflow: character() name of the workflow the file is associated with.
task: character() name of the task in the workflow that generated the file.
path: charcter() full path to the file in the google bucket.
submissionId: character() internal identifier associated with the submission the files belong to.
workflowId: character() internal identifer associated with each workflow (e.g., row of an avtable() used as input) in the submission.
submissionRoot: character() path in the workspace bucket to the root of files created by this submission.
namespace: character() AnVIL workspace namespace (billing account) associated with the submissionId.
name: character(1) AnVIL workspace name associated with the submissionId.

avworkflow_localize() prints a message indicating the number of files that are (if dry = FALSE) or would be localized. If no files require localization (i.e., local files are not older than the bucket files), then no files are localized. avworkflow_localize() returns a tibble of file name and bucket path of files to be synchronized.

avworkflow_run() returns config, invisibly. Note that config has an added LastSubmissionId value for the submission ID of the last run workflow.

avworkflow_stop() returns (invisibly) TRUE on successfully requesting that the workflow stop, FALSE if the workflow is already aborting, aborted, or done.

avworkflow_info() returns a tibble with columns: submissionId, workflowId, workflowName,status, start, end, inputs and outputs.

Bioconductor/AnVIL documentation built on Feb. 24, 2025, 9:50 a.m.