files: HCA File Querying
In Bioconductor/hca: Exploring the Human Cell Atlas Data Coordinating Platform

files

R Documentation

HCA File Querying

Description

files() takes a list of user provided project titles to be used to query the HCA API for information about available files.

files_download() takes a tibble of files and a directory location as arguments to download the files of the tibble into the specified directory.

files_detail() takes a unique file_id and catalog for the file, and returns details about the specified file as a list-of-lists

files_cache() is the default location of the cache of downloaded files.

Usage

files(
  filters = NULL,
  size = 1000L,
  sort = "projectTitle",
  order = c("asc", "desc"),
  catalog = NULL,
  as = c("tibble", "lol", "list", "tibble_expanded"),
  columns = files_default_columns("character")
)

files_default_columns(as = c("tibble", "character"))

files_download(tbl, destination = NULL)

files_facets(facet = character(), catalog = NULL)

files_detail(uuid, catalog = NULL)

files_cache(create = FALSE)

Arguments

`filters`	filter object created by `filters()`, or `NULL` (default; all projects).
`size`	integer(1) maximum number of results to return; default: all projects matching `filter`. The default (10000) is meant to be large enough to return all results.
`sort`	character(1) project facet (see `facet_options()`) to sort result; default: `"projectTitle"`.
`order`	character(1) sort order. One of `"asc"` (ascending) or `"desc"` (descending).
`catalog`	character(1) source of data. Use `catalogs()` for possible values.
`as`	character(1) return format. One of `"tibble"` (default), `"lol"`, `"list"`, or `"tibble_expanded"`, as described in the Details and Value sections of `?projects`.
`columns`	named character() indicating the paths to be used for parsing the 'lol' returned from the HCA to a tibble. The names of `columns` are used as column names in the returned tibble. If the columns are unnamed, a name is derived from the elements of `path` by removing `⁠hits[]⁠` and all `⁠[]⁠`, e.g., a path `⁠hits[].donorOrganisms[].biologicalSex[*]⁠` is given the name `donorOrganisms.biologicalSex`.
`tbl`	tibble of files (result of `files()`)
`destination`	character() vector name of temporary directory to use for file downloads, or `NULL`
`facet`	character() of valid facet names. Summary results (see 'Value', below) are returned when missing or length greater than 1; details are returned when a single facet is specified.
`uuid`	character() unique identifier (e.g., `projectId`) of the object.
`create`	logical(1) create the default cache location, if it does not yet exist.

Details

files_cache() can be useful when it is necessary to 'clean up' the cache, e.g., BiocFileCache::cleanbfc() or more dramatically unlink(files_cache(), recursive = TRUE).

Value

files_download() returns a character() vector of file destinations

files_detail() returns a list-of-lists containing relevant details about the file.

files_cache() returns the path to the default cache. Use this as the ⁠cache=⁠ argument to BiocFileCache().

Examples

title <- paste(
    "Tabula Muris: Transcriptomic characterization of 20 organs and",
    "tissues from Mus musculus at single cell resolution"
)
filters <- filters( projectTitle = list(is = title) )
files(filters = filters)

files_filter <- filters(
    projectId = list(is = "cddab57b-6868-4be4-806f-395ed9dd635a"),
    fileFormat = list(is = "loom")
)
files_tbl <- files(filter = files_filter)
## Not run: files_download(files_tbl, destination = tempdir())
files_facets()
files_facets("fileFormat")

file <- files(size = 1, as = "list")
file_uuid <- file[["hits"]][[1]][["entryId"]]
files_detail(uuid = file_uuid)

files_cache(create = FALSE)

Bioconductor/hca documentation built on June 11, 2025, 11:47 a.m.