files: HCA File Querying

View source: R/files.R

filesR Documentation

HCA File Querying

Description

files() takes a list of user provided project titles to be used to query the HCA API for information about available files.

files_download() takes a tibble of files and a directory location as arguments to download the files of the tibble into the specified directory.

files_detail() takes a unique file_id and catalog for the file, and returns details about the specified file as a list-of-lists

files_cache() is the default location of the cache of downloaded files.

Usage

files(
  filters = NULL,
  size = 1000L,
  sort = "projectTitle",
  order = c("asc", "desc"),
  catalog = NULL,
  as = c("tibble", "lol", "list", "tibble_expanded"),
  columns = files_default_columns("character")
)

files_default_columns(as = c("tibble", "character"))

files_download(tbl, destination = NULL)

files_facets(facet = character(), catalog = NULL)

files_detail(uuid, catalog = NULL)

files_cache(create = FALSE)

Arguments

filters

filter object created by filters(), or NULL (default; all projects).

size

integer(1) maximum number of results to return; default: all projects matching filter. The default (10000) is meant to be large enough to return all results.

sort

character(1) project facet (see facet_options()) to sort result; default: "projectTitle".

order

character(1) sort order. One of "asc" (ascending) or "desc" (descending).

catalog

character(1) source of data. Use catalogs() for possible values.

as

character(1) return format. One of "tibble" (default), "lol", "list", or "tibble_expanded", as described in the Details and Value sections of ?projects.

columns

named character() indicating the paths to be used for parsing the 'lol' returned from the HCA to a tibble. The names of columns are used as column names in the returned tibble. If the columns are unnamed, a name is derived from the elements of path by removing ⁠hits[*]⁠ and all ⁠[*]⁠, e.g., a path ⁠hits[*].donorOrganisms[*].biologicalSex[*]⁠ is given the name donorOrganisms.biologicalSex.

tbl

tibble of files (result of files())

destination

character() vector name of temporary directory to use for file downloads, or NULL

facet

character() of valid facet names. Summary results (see 'Value', below) are returned when missing or length greater than 1; details are returned when a single facet is specified.

uuid

character() unique identifier (e.g., projectId) of the object.

create

logical(1) create the default cache location, if it does not yet exist.

Details

files_cache() can be useful when it is necessary to 'clean up' the cache, e.g., BiocFileCache::cleanbfc() or more dramatically unlink(files_cache(), recursive = TRUE).

Value

files_download() returns a character() vector of file destinations

files_detail() returns a list-of-lists containing relevant details about the file.

files_cache() returns the path to the default cache. Use this as the ⁠cache=⁠ argument to BiocFileCache().

Examples

title <- paste(
    "Tabula Muris: Transcriptomic characterization of 20 organs and",
    "tissues from Mus musculus at single cell resolution"
)
filters <- filters( projectTitle = list(is = title) )
files(filters = filters)

files_filter <- filters(
    projectId = list(is = "cddab57b-6868-4be4-806f-395ed9dd635a"),
    fileFormat = list(is = "loom")
)
files_tbl <- files(filter = files_filter)
## Not run: files_download(files_tbl, destination = tempdir())
files_facets()
files_facets("fileFormat")

file <- files(size = 1, as = "list")
file_uuid <- file[["hits"]][[1]][["entryId"]]
files_detail(uuid = file_uuid)

files_cache(create = FALSE)

Bioconductor/hca documentation built on March 27, 2024, 3:15 a.m.