get_datasets: Retrieve all datasets

View source: R/allEndpoints.R

get_datasetsR Documentation

Retrieve all datasets

Description

Retrieve all datasets

Usage

get_datasets(
  query = NA_character_,
  filter = NA_character_,
  taxa = NA_character_,
  uris = NA_character_,
  offset = 0L,
  limit = 20L,
  sort = "+id",
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

query

The search query. Queries can include plain text or ontology terms They also support conjunctions ("alpha AND beta"), disjunctions ("alpha OR beta") grouping ("(alpha OR beta) AND gamma"), prefixing ("alpha*"), wildcard characters ("BRCA?") and fuzzy matches ("alpha~").

filter

Filter results by matching expression. Use filter_properties function to get a list of all available parameters. These properties can be combined using "and" "or" clauses and may contain common operators such as "=", "<" or "in". (e.g. "taxon.commonName = human", "taxon.commonName in (human,mouse), "id < 1000")

taxa

A vector of taxon common names (e.g. human, mouse, rat). Providing multiple species will return results for all species. These are appended to the filter and equivalent to filtering for taxon.commonName property

uris

A vector of ontology term URIs. Providing multiple terms will return results containing any of the terms and their children. These are appended to the filter and equivalent to filtering for allCharacteristics.valueUri

offset

The offset of the first retrieved result.

limit

Defaults to 20. Limits the result to specified amount of objects. Has a maximum value of 100. Use together with offset and the totalElements attribute in the output to compile all data if needed.

sort

Order results by the given property and direction. The '+' sign indicate ascending order whereas the '-' indicate descending.

raw

TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.

memoised

Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.

file

The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.

overwrite

Whether or not to overwrite if a file exists at the specified filename.

Value

A data table with information about the queried dataset(s). A list if raw = TRUE. Returns an empty list if no datasets matched.

The fields of the output data.table are:

  • experiment.shortName: Shortname given to the dataset within Gemma. Often corresponds to accession ID

  • experiment.name: Full title of the dataset

  • experiment.ID: Internal ID of the dataset.

  • experiment.description: Description of the dataset

  • experiment.troubled: Did an automatic process within gemma or a curator mark the dataset as "troubled"

  • experiment.accession: Accession ID of the dataset in the external database it was taken from

  • experiment.database: The name of the database where the dataset was taken from

  • experiment.URI: URI of the original database

  • experiment.sampleCount: Number of samples in the dataset

  • experiment.batchEffectText: A text field describing whether the dataset has batch effects

  • experiment.batchCorrected: Whether batch correction has been performed on the dataset.

  • experiment.batchConfound: 0 if batch info isn't available, -1 if batch counfoud is detected, 1 if batch information is available and no batch confound found

  • experiment.batchEffect: -1 if batch p value < 0.0001, 1 if batch p value > 0.1, 0 if otherwise and when there is no batch information is available or when the data is confounded with batches.

  • experiment.rawData: -1 if no raw data available, 1 if raw data was available. When available, Gemma reprocesses raw data to get expression values and batches

  • geeq.qScore: Data quality score given to the dataset by Gemma.

  • geeq.sScore: Suitability score given to the dataset by Gemma. Refers to factors like batches, platforms and other aspects of experimental design

  • taxon.name: Name of the species

  • taxon.scientific: Scientific name for the taxon

  • taxon.ID: Internal identifier given to the species by Gemma

  • taxon.NCBI: NCBI ID of the taxon

  • taxon.database.name: Underlying database used in Gemma for the taxon

  • taxon.database.ID: ID of the underyling database used in Gemma for the taxon

Examples

get_datasets()
get_datasets(taxa = c("mouse", "human"), uris = "http://purl.obolibrary.org/obo/UBERON_0002048")
# filter below is equivalent to the call above
get_datasets(filter = "taxon.commonName in (mouse,human) and allCharacteristics.valueUri = http://purl.obolibrary.org/obo/UBERON_0002048")
get_datasets(query = "lung")

PavlidisLab/Gemma-API documentation built on Dec. 15, 2024, 12:45 a.m.