drs: DRS (Data Repository Service) URL management

drs_statR Documentation

DRS (Data Repository Service) URL management

Description

drs_stat() resolves zero or more DRS URLs to their google bucket location.

drs_access_url() returns a vector of 'signed' URLs that allow access to restricted resources via standard https protocols.

drs_cp() copies 0 or more DRS URIs to a google bucket or local folder

Usage

drs_stat(source = character(), region = "US")

drs_access_url(source = character(), region = "US")

drs_cp(source, destination, ..., overwrite = FALSE)

Arguments

source

character() DRS URLs (beginning with 'drs://') to resources managed by the 'martha' DRS resolution server.

region

character(1) Google cloud 'region' in which the DRS resource is located. Most data is located in "US" (the default); in principle "auto" allows for discovery of the region, but sometimes fails. Regions are enumerated at https://cloud.google.com/storage/docs/locations#available-locations.

destination

character(1), google cloud bucket or local file system destination path.

...

additional arguments, passed to gsutil_cp() for file copying.

overwrite

logical(1) indicating that source fileNames present in destination should downloaded again.

Details

drs_stat() sends requests in parallel to the DRS server, using 8 forked processes (by default) to speed up queries. Use options(mc.cores = 16L), for instance, to set the number of processes to use.

drs_stat() uses the AnVIL 'pet' account associated with a runtime. The pet account is discovered by default when evaluated on an AnVIL runtime (e.g., in RStudio or a Jupyter notebook in the AnVIL), or can be found in the return value of avruntimes().

Errors reported by the DRS service are communicated to the user, but can be cryptic. The DRS service itself is called 'martha'. Errors mentioning martha might commonly involve a mal-formed DRS uri. Martha uses a service called 'bond' to establish credentials with registered third party entities such as Kids First. Errors mentioning bond might involve absence of credentials, within Terra, to access the resource; check that, in the Terra / AnVIL graphical user interface, the user profiles 'External Entities' includes the organization to which the DRS uri is being resolved.

Value

drs_stat() returns a tbl with the following columns:

  • fileName: character() (resolver sometimes returns null).

  • size: integer() (resolver sometimes returns null).

  • contentType: character() (resolver sometimes returns null).

  • gsUri: character() (resolver sometimes returns null).

  • timeCreated: character() (the time created formatted using ISO 8601; resolver sometimes returns null).

  • timeUpdated: character() (the time updated formatted using ISO 8601; resolver sometimes returns null).

  • bucket: character() (resolver sometimes returns null).

  • name: character() (resolver sometimes returns null).

  • googleServiceAccount: list() (null unless the DOS url belongs to a Bond supported host).

  • hashes: list() (contains the hashes type and their checksum value; if unknown. it returns null)

drs_access_url() returns a vector of https URLs corresponding to the vector of DRS URIs provided as inputs to the function.

drs_cp() returns a tibble like drs_stat(), but with additional columns

  • simple: logical() value indicating whether resolution used a simple signed URL (TRUE) or auxilliary service account.

  • destination: character() full path to retrieved object(s)

Examples

drs <- c(
    vcf = "drs://dg.ANV0/6f633518-f2de-4460-aaa4-a27ee6138ab5",
    tbi = "drs://dg.ANV0/4fb9e77f-c92a-4deb-ac90-db007dc633aa"
)

if (gcloud_exists() && startsWith(gcloud_account(), "pet-")) {
    ## from within AnVIL
    tbl <- drs_stat(uri)
    urls <- drs_access_url(uri)
    ## library(VariantAnnotation)
    ## vcffile <- VcfFile(urls[["vcf"]], urls[["tbi"]])
    ##
    ## header <- scanVcfHeader(vcffile)
    ## meta(header)[["contig"]]
}


Bioconductor/AnVIL documentation built on April 12, 2024, 6:41 p.m.