datasetInfo: datasetInfo

View source: R/ep_dataset.R

datasetInfoR Documentation

datasetInfo

Description

Retrieves information about a single dataset based on the given dataset identifier. Combines several API calls

Usage

datasetInfo(
  dataset,
  request = NULL,
  ...,
  file = NULL,
  return = TRUE,
  overwrite = FALSE,
  memoised = FALSE
)

Arguments

dataset

Character. Can either be the dataset ID or its short name (e.g. GSE1234). If a vector of length>1 is provided return all matching dataset objects similar to allDatasets but without access to additional parameters. request parameter cannot be specified for inputs of length>1 unless specifiet otherwise in request description.

request

Character. If NULL retrieves the dataset object. Otherwise

  • platforms: Retrieves platforms for the given dataset

  • samples: Retrieves samples for the given dataset

  • annotations: Retrieves the annotations for the given dataset

  • design: Retrieves the design for the given dataset

  • data: Retrieves the data for the given dataset. Parameters:

    • filter: Optional, defaults to FALSE. If TRUE, call returns filtered expression data.

    • IdColnames: Optional. defaults to FALSE. If true shortens data column names to only include the bioAssayId which is unique to samples in Gemma. Makes it easier to match to samples acqured from the "samples" request.

  • differential: Retrieves available differential expression tests for the given dataset.

    • offset: Optional, defaults to 0. Skips the specified amount of objects when retrieving them from the database.

    • limit: Optional, defaults to 20. Limits the result to specified amount of objects. Use 0 for no limit.

  • degs: Retrieves the differential expression results for the given dataset.

    • differential: Differential id of the differential expression. Can be acquired from the differential endpoint.

  • diffExExpr: Retrieves expression values for differential expression subsets for the given datasets. Parameters:

    • diffExSet: Result set id of the differential expression. Can be acquired from the differential endpoint:

      datasetInfo('GSE43364',request = 'differential')$resultSets[[1]]$id

    • keepNonSpecific: Optional, defaults to FALSE.

      If set to false, the response will only include elements that map exclusively to each gene

      If set to true, the response will include all elements that map to each gene, even if they also map to other genes.

    • threshold: Optional, defaults to 100. The threshold that the differential expression has to meet to be included in the response.

    • limit: Optional, defaults to 100. Maximum amount of returned gene-probe expression level pairs to include in the response.

    • consolidate: Optional. Defaults no NULL.

      Whether genes with multiple elements should consolidate the information. If the 'keepNonSpecific' parameter is set to true, then all gene non-specific vectors are excluded from the chosen procedure.

      The options are:

      • NULL:list all vectors separately.

      • "pickmax": only return the vector that has the highest expression (mean over all its bioAssays).

      • "pickvar": only return the vector with highest variance of expression across its bioAssays

      • "average": create a new vector that will average the bioAssay values from all vectors

  • geneExpression: Retrieves the expression levels of given genes for given datasets. Can be used with multiple datasets. Parameters:

    • genes: Required. A list of identifiers, separated by commas (e.g: 1859, 5728).

      Can either be the NCBI ID (1859), Ensembl ID (ENSG00000157540) or official symbol (DYRK1A) of the gene.

      NCBI ID is the most efficient (and guaranteed to be unique) identifier.

      Official symbol represents a gene homologue for a random taxon, unless used in a specific taxon (see the Taxa Endpoints).

      If the gene taxon does not match the taxon of the given datasets, expression levels for that gene will be missing from the response

      You can combine various identifiers in one query, but any invalid identifier will cause the call to yield an error. Duplicate identifiers of the same gene will result in duplicates in the response.

    • keepNonSpecific: Optional. Defaults to FALSE.

      If set to false, the response will only include elements that map exclusively to each queried gene

      If set to true, the response will include all elements that map to each queried gene, even if they also map to other genes.

    • consolidate: Optional. Defaults no NULL.

      Whether genes with multiple elements should consolidate the information. If the 'keepNonSpecific' parameter is set to true, then all gene non-specific vectors are excluded from the chosen procedure.

      The options are:

      • NULL:list all vectors separately.

      • "pickmax": only return the vector that has the highest expression (mean over all its bioAssays).

      • "pickvar": only return the vector with highest variance of expression across its bioAssays

      • "average": create a new vector that will average the bioAssay values from all vectors

...

Use if the specified request has additional parameters.

file

Character. File path. If provided, response will be saved to file

return

Logical. If the response should be returned. Set to false when you only want to save a file

overwrite

Logical. If TRUE, existing files will be overwritten. If FALSE a warning will be thrown and no action is taken.

memoised

Logical. If TRUE a memoised version of the function will be used which is faster for repeated requests. Use forgetGemmaMemoised to clear memory.

Value

A data.frame or a list depending on the request

Examples


datasetInfo('GSE81454')
datasetInfo('GSE81454', request = 'platforms')
datasetInfo('GSE81454', request='data',filter = FALSE)


PavlidisLab/gemmaAPI.R documentation built on July 26, 2022, 7:13 a.m.