projects: HCA Project Querying

View source: R/projects.R

projectsR Documentation

HCA Project Querying

Description

projects() takes user input to be used to query the HCA API for information about available projects.

projects_facets() summarizes facets and terms used by all records in the projects index.

⁠*_columns()⁠ returns a tibble or named character vector describing the content of the tibble returned by projects(), files(), samples(), or bundles().

projects_detail() takes a unique project_id and catalog for the project, and returns details about the specified project as a list-of-lists

See project_information() and project_title() to easily summarize a project from its project id.

Usage

projects(
  filters = NULL,
  size = 1000L,
  sort = "projectTitle",
  order = c("asc", "desc"),
  catalog = NULL,
  as = c("tibble", "lol", "list", "tibble_expanded"),
  columns = projects_default_columns("character")
)

projects_facets(facet = character(), catalog = NULL)

projects_default_columns(as = c("tibble", "character"))

projects_detail(uuid, catalog = NULL)

Arguments

filters

filter object created by filters(), or NULL (default; all projects).

size

integer(1) maximum number of results to return; default: all projects matching filter. The default (10000) is meant to be large enough to return all results.

sort

character(1) project facet (see facet_options()) to sort result; default: "projectTitle".

order

character(1) sort order. One of "asc" (ascending) or "desc" (descending).

catalog

character(1) source of data. Use catalogs() for possible values.

as

character(1) return format. One of "tibble" (default), "lol", "list", or "tibble_expanded", as described in the Details and Value sections of ?projects.

columns

named character() indicating the paths to be used for parsing the 'lol' returned from the HCA to a tibble. The names of columns are used as column names in the returned tibble. If the columns are unnamed, a name is derived from the elements of path by removing ⁠hits[*]⁠ and all ⁠[*]⁠, e.g., a path ⁠hits[*].donorOrganisms[*].biologicalSex[*]⁠ is given the name donorOrganisms.biologicalSex.

facet

character() of valid facet names. Summary results (see 'Value', below) are returned when missing or length greater than 1; details are returned when a single facet is specified.

uuid

character() unique identifier (e.g., projectId) of the object.

Details

The as argument determines the object returned by the function. Possible values are:

  • "tibble" (default) A tibble (data.frame) summarizing essential elements of projects, samples, bundles, or files.

  • "lol" A 'list-of-lists' representation of the JSON returned by the query as a 'list-of-lists' data structure, indexed and presented to enable convenient filtering, selection, and extraction. See ?lol.

  • "list" An R list (typically, highly recursive) containing detailed project information, constructed from the JSON response to the original query.

  • "tibble_expanded" A tibble (data.frame) containing (almost) all information for each project, sample, bundle, or file. The exception is user-contributed matrices present in projects() records; these must be accessed using the "lol" format to extract specific paths as a standard "tibble".

Value

When as = "tibble" or as = "tibble_expanded", a tibble with each row representing an HCA object (project, sample, bundle, or file, depending on the function invoked), and columns summarizing the object. "tibble_expanded" columns contains almost all information about the object, except as noted in the Details section.

When as = "lol", a list-of-lists data structure representing detailed information on each object.

When as = "list", projects() returns an R list, typically containing other lists or atomic vectors, representing detailed information on each project.

projects_facets() invoked with no ⁠facet=⁠ argument returns a tibble summarizing terms available as projects() return values, and for use in filters. The tibble contains columns

  • facet: the name of the facet.

  • n_terms: the number of distinct values the facet can take.

  • n_values: the number of occurrences of the facet term in the entire catalog.

projects_facets() invoked with a scalar value for ⁠facet=⁠ returns a tibble summarizing terms used in the facet, and the number of occurrences of the term in the entire catalog.

⁠*_columns()⁠ returns a tibble with column name containing the column name used in the tibble returned by projects(), files(), samples(), or bundles(), and path the path (see lol_hits()) to the data in the list-of-lists by the same functions when as = "lol". When as = "character", the return value is a named list with paths as elements and abbreviations as names.

list-of-lists containing relevant details about the project.

See Also

project_information() and project_title() to easily summarize a project from its project id.

lol() and other ⁠lol_*()⁠ functions for working with the list-of-list data structure returned when as = "lol".

Examples

projects(filters(), size = 100)

projects_facets()
projects_facets("genusSpecies")

projects_default_columns()

project <- projects(size = 1, as = "list")
project_uuid <- project[["hits"]][[1]][["entryId"]]
projects_detail(uuid = project_uuid)


Bioconductor/hca documentation built on March 27, 2024, 3:15 a.m.