projects: HCA Project Querying

View source: R/projects.R

projectsR Documentation

HCA Project Querying


projects() takes user input to be used to query the HCA API for information about available projects.

projects_facets() summarizes facets and terms used by all records in the projects index.

*_columns() returns a tibble or named character vector describing the content of the tibble returned by projects(), files(), samples(), or bundles().

projects_detail() takes a unique project_id and catalog for the project, and returns details about the specified project as a list-of-lists


  filters = NULL,
  size = 1000L,
  sort = "projectTitle",
  order = c("asc", "desc"),
  catalog = NULL,
  as = c("tibble", "lol", "list", "tibble_expanded"),
  columns = projects_default_columns("character")

projects_facets(facet = character(), catalog = NULL)

projects_default_columns(as = c("tibble", "character"))

projects_detail(uuid, catalog = NULL)



filter object created by filters(), or NULL (default; all projects).


integer(1) maximum number of results to return; default: all projects matching filter. The default (10000) is meant to be large enough to return all results.


character(1) project facet (see facet_options()) to sort result; default: "projectTitle".


character(1) sort order. One of "asc" (ascending) or "desc" (descending).


character(1) source of data. Use catalogs() for possible values.


character(1) return format. One of "tibble" (default), "lol", "list", or "tibble_expanded", as described in the Details and Value sections of ?projects.


named character() indicating the paths to be used for parsing the 'lol' returned from the HCA to a tibble. The names of columns are used as column names in the returned tibble. If the columns are unnamed, a name is derived from the elements of path by removing hits[*] and all [*], e.g., a path hits[*].donorOrganisms[*].biologicalSex[*] is given the name donorOrganisms.biologicalSex.


character() of valid facet names. Summary results (see 'Value', below) are returned when missing or length greater than 1; details are returned when a single facet is specified.


character() unique identifier (e.g., projectId) of the object.


The as argument determines the object returned by the function. Possible values are:

  • "tibble" (default) A tibble (data.frame) summarizing essential elements of projects, samples, bundles, or files.

  • "lol" A 'list-of-lists' representation of the JSON returned by the query as a 'list-of-lists' data structure, indexed and presented to enable convenient filtering, selection, and extraction. See ?lol.

  • "list" An R list (typically, highly recursive) containing detailed project information, constructed from the JSON response to the original query.

  • "tibble_expanded" A tibble (data.frame) containing (almost) all information for each project, sample, bundle, or file. The exception is user-contributed matrices present in projects() records; these must be accessed using the "lol" format to extract specific paths as a standard "tibble".


When as = "tibble" or as = "tibble_expanded", a tibble with each row representing an HCA object (project, sample, bundle, or file, depending on the function invoked), and columns summarizing the object. "tibble_expanded" columns contains almost all information about the object, except as noted in the Details section.

When as = "lol", a list-of-lists data structure representing detailed information on each object.

When as = "list", projects() returns an R list, typically containing other lists or atomic vectors, representing detailed information on each project.

projects_facets() invoked with no facet= argument returns a tibble summarizing terms available as projects() return values, and for use in filters. The tibble contains columns

  • facet: the name of the facet.

  • n_terms: the number of distinct values the facet can take.

  • n_values: the number of occurrences of the facet term in the entire catalog.

projects_facets() invoked with a scalar value for facet= returns a tibble summarizing terms used in the facet, and the number of occurrences of the term in the entire catalog.

*_columns() returns a tibble with column name containing the column name used in the tibble returned by projects(), files(), samples(), or bundles(), and path the path (see lol_hits()) to the data in the list-of-lists by the same functions when as = "lol". When as = "character", the return value is a named list with paths as elements and abbreviations as names.

list-of-lists containing relevant details about the project.

See Also

lol() and other lol_*() functions for working with the list-of-list data structure returned when as = "lol".


projects(filters(), size = 100)



project <- projects(size = 1, as = "list")
project_uuid <- project[["hits"]][[1]][["entryId"]]
projects_detail(uuid = project_uuid)

Bioconductor/hca documentation built on July 28, 2022, 6:04 p.m.