avworkflow: Workflow submissions and file outputs

Description Usage Arguments Details Value Examples

Description

avworkflows() returns a tibble summarizing available workflows.

avworkflow_jobs() returns a tibble summarizing submitted workflow jobs for a namespace and name.

avworkflow_files() returns a tibble containing information and file paths to workflow outputs.

avworkflow_localize() creates or synchronizes a local copy of files with files stored in the workspace bucket and produced by the workflow.

avworkflow_configuration_template() returns a template for defining workflow configurations. This template can be used as a starting point for providing a custom configuration.

avworkflow_configuration() returns a list structure describing an existing workflow configuration.

avworkflow_import_configuration() updates an existing configuration, e.g., changing inputs to the workflow.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
avworkflows(namespace = avworkspace_namespace(), name = avworkspace_name())

avworkflow_jobs(namespace = avworkspace_namespace(), name = avworkspace_name())

avworkflow_files(submissionId = NULL, bucket = avbucket())

avworkflow_localize(
  submissionId = NULL,
  destination = NULL,
  type = c("control", "output", "all"),
  bucket = avbucket(),
  dry = TRUE
)

avworkflow_configuration_template()

avworkflow_configuration(
  configuration_namespace,
  configuration_name,
  namespace = avworkspace_namespace(),
  name = avworkspace_name()
)

avworkflow_import_configuration(
  config,
  namespace = avworkspace_namespace(),
  name = avworkspace_name()
)

Arguments

namespace

character(1) AnVIL workspace namespace as returned by, e.g., avworkspace_namespace()

name

character(1) AnVIL workspace name as returned by, eg., avworkspace_name().

submissionId

a character() of workflow submission ids, or a tibble with column submissionId, or NULL / missing. See 'Details'.

bucket

character(1) name of the google bucket in which the workflow products are available, as gs://.... Usually the bucket of the active workspace, returned by avbucket().

destination

character(1) file path to the location where files will be synchronized. For directories in the current working directory, be sure to prepend with "./". When NULL, the submissionId is used as the destination. destination may also be a google bucket, in which case th workflow files are synchronized from the workspace to a second bucket.

type

character(1) copy "control" (default), "output", or "all" files produced by a workflow.

dry

logical(1) when TRUE (default), report the consequences but do not perform the action requested. When FALSE, perform the action.

configuration_namespace

character(1) namespace of the workflow. Often the same as the namespace of the workspace. Discover configuration namespace and name information from avworkflows().

configuration_name

character(1) name of the workflow, from avworkflows()

config

a named list describing the full configuration, e.g., created from editing the return value of avworkflow_configuration() or avworkflow_configuration_template().

Details

For avworkflow_files(), the submissionId is the identifier associated with the workflow job, and is present in the return value of avworkflow_jobs(); the example illustrates how the first row of avworkflow_jobs() (i.e., the most recenltly completed workflow) can be used as input to avworkflow_files(). When submissionId is not provided, the return value is for the most recently submitted workflow of the namespace and name of avworkspace().

avworkflow_localize(). type = "control" files summarize workflow progress; they can be numerous but are frequently small and quickly syncronized. type = "output" files are the output products of the workflow stored in the workspace bucket. Depending on the workflow, outputs may be large, e.g., aligned reads in bam files. See gsutil_cp() to copy individual files from the bucket to the local drive.

1
2
3
`avworkflow_localize()` treats `submissionId=` in the same way
as `avworkflow_files()`: when missing, files from the most
recent workflow job are candidates for localization.

Value

avworkflows() returns a tibble. Each workflow is in a 'namespace' and has a 'name', as illustrated in the example. Columns are

avworkflow_jobs() returns a tibble, sorted by submissionDate, with columns

avworkflow_files() returns a tibble with columns

avworkflow_localize() prints a message indicating the number of files that are (if dry = FALSE) or would be localized. If no files require localization (i.e., local files are not older than the bucket files), then no files are localized. avworkflow_localize() returns a tibble of file name and bucket path of files to be synchronized.

avworkflow_configuration_template() returns a list providing a template for configuration lists, with the following structure:

The exact format of the configuration is important.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
One common problem is that a scalar character vector `"bar"` is
interpretted as a json 'array' `["bar"]` rather than a json
string `"bar"`. Enclose the string with
`jsonlite::unbox("bar")` in the configuration list if the
length 1 character vector in R is to be interpretted as a json
string.

A second problem is that an unquoted unboxed character string
`unbox("foo")` is required by AnVIL to be quoted. This is
reported as a warning() about invalid inputs or outputs, and
the solution is to provide a quoted string `unbox('"foo"')`.

avworkflow_configuration() returns a list structure describing the configuration. See avworkflow_configuration_template() for the structure of a typical workflow.

avworkflow_import_configuration() returns an object describing the updated configuration. The return value includes invalid or unused elements of the config input. Invalid or unused elements of config are also reported as a warning.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
if (gcloud_exists() && nzchar(avworkspace_name()))
    ## from within AnVIL
    avworkflows() %>% select(namespace, name)

if (gcloud_exists() && nzchar(avworkspace_name()))
    ## from within AnVIL
    avworkflow_jobs()

if (gcloud_exists() && nzchar(avworkspace_name())) {
    ## e.g., from within AnVIL
    avworkflow_jobs() %>%
    ## select most recent workflow
    head(1) %>%
    ## find paths to output and log files on the bucket
    avworkflow_files()
}

if (gcloud_exists() && nzchar(avworkspace_name())) {
    avworkflow_localize(dry = TRUE)
}

avworkflow_configuration_template()

## Not run: 
config <-
    avworkflow_configuration("bioconductor-anvil-rpci", "AnVILBulkRNASeq")
str(config)

## End(Not run)

## Not run: 
avworkflow_import_configuration(config)

## End(Not run)

Bioconductor/AnVIL documentation built on May 4, 2021, 9:39 a.m.