files: HCA File Querying

View source: R/files.R

filesR Documentation

HCA File Querying


files() takes a list of user provided project titles to be used to query the HCA API for information about available files.

files_download() takes a tibble of files and a directory location as arguments to download the files of the tibble into the specified directory.

files_detail() takes a unique file_id and catalog for the file, and returns details about the specified file as a list-of-lists

files_cache() is the default location of the cache of downloaded files.


  filters = NULL,
  size = 1000L,
  sort = "projectTitle",
  order = c("asc", "desc"),
  catalog = NULL,
  as = c("tibble", "lol", "list", "tibble_expanded"),
  columns = files_default_columns("character")

files_default_columns(as = c("tibble", "character"))

files_download(tbl, destination = NULL)

files_facets(facet = character(), catalog = NULL)

files_detail(uuid, catalog = NULL)

files_cache(create = FALSE)



filter object created by filters(), or NULL (default; all projects).


integer(1) maximum number of results to return; default: all projects matching filter. The default (10000) is meant to be large enough to return all results.


character(1) project facet (see facet_options()) to sort result; default: "projectTitle".


character(1) sort order. One of "asc" (ascending) or "desc" (descending).


character(1) source of data. Use catalogs() for possible values.


character(1) return format. One of "tibble" (default), "lol", "list", or "tibble_expanded", as described in the Details and Value sections of ?projects.


named character() indicating the paths to be used for parsing the 'lol' returned from the HCA to a tibble. The names of columns are used as column names in the returned tibble. If the columns are unnamed, a name is derived from the elements of path by removing ⁠hits[*]⁠ and all ⁠[*]⁠, e.g., a path ⁠hits[*].donorOrganisms[*].biologicalSex[*]⁠ is given the name donorOrganisms.biologicalSex.


tibble of files (result of files())


character() vector name of temporary directory to use for file downloads, or NULL


character() of valid facet names. Summary results (see 'Value', below) are returned when missing or length greater than 1; details are returned when a single facet is specified.


character() unique identifier (e.g., projectId) of the object.


logical(1) create the default cache location, if it does not yet exist.


files_cache() can be useful when it is necessary to 'clean up' the cache, e.g., BiocFileCache::cleanbfc() or more dramatically unlink(files_cache(), recursive = TRUE).


files_download() returns a character() vector of file destinations

files_detail() returns a list-of-lists containing relevant details about the file.

files_cache() returns the path to the default cache. Use this as the ⁠cache=⁠ argument to BiocFileCache().


title <- paste(
    "Tabula Muris: Transcriptomic characterization of 20 organs and",
    "tissues from Mus musculus at single cell resolution"
filters <- filters( projectTitle = list(is = title) )
files(filters = filters)

files_filter <- filters(
    projectId = list(is = "cddab57b-6868-4be4-806f-395ed9dd635a"),
    fileFormat = list(is = "loom")
files_tbl <- files(filter = files_filter)
## Not run: files_download(files_tbl, destination = tempdir())

file <- files(size = 1, as = "list")
file_uuid <- file[["hits"]][[1]][["entryId"]]
files_detail(uuid = file_uuid)

files_cache(create = FALSE)

Bioconductor/hca documentation built on Oct. 28, 2023, 4:55 p.m.