fetch: Fetch Data from UCSC Xena Hosts

View source: R/fetch.R

fetchR Documentation

Fetch Data from UCSC Xena Hosts

Description

When you want to query just data for several genes/samples from UCSC Xena datasets, a better way is to use these fetch_ functions instead of downloading a whole dataset. Details about functions please see the following sections.

Usage

fetch(host, dataset)

fetch_dense_values(
  host,
  dataset,
  identifiers = NULL,
  samples = NULL,
  check = TRUE,
  use_probeMap = FALSE,
  time_limit = 30
)

fetch_sparse_values(host, dataset, genes, samples = NULL, time_limit = 30)

fetch_dataset_samples(host, dataset, limit = NULL)

fetch_dataset_identifiers(host, dataset)

has_probeMap(host, dataset, return_url = FALSE)

Arguments

host

a UCSC Xena host, like "https://toil.xenahubs.net". All available hosts can be printed by xena_default_hosts().

dataset

a UCSC Xena dataset, like "tcga_RSEM_gene_tpm". All available datasets can be printed by running XenaData$XenaDatasets or obtained from UCSC Xena datapages.

identifiers

Identifiers could be probe (like "ENSG00000000419.12"), gene (like "TP53") etc.. If it is NULL, all identifiers in the dataset will be used.

samples

ID of samples, like "TCGA-02-0047-01". If it is NULL, all samples in the dataset will be used. However, it is better to download the whole datasets if you query many samples and genes.

check

if TRUE, check whether specified identifiers and samples exist the dataset (all failed items will be filtered out). However, if FALSE, the code is much faster.

use_probeMap

if TRUE, will check if the dataset has ProbeMap firstly. When the dataset you want to query has a identifier-to-gene mapping, identifiers can be gene symbols even the identifiers of dataset are probes or others.

time_limit

time limit for getting response in seconds.

genes

gene names.

limit

number of samples, if NULL, return all samples.

return_url

if TRUE, returns the info of probeMap instead of a logical value when the result exists.

Details

There are three primary data types: dense matrix (samples by probes (or say identifiers)), sparse (sample, position, variant), and segmented (sample, position, value).

Dense matrices can be genotypic or phenotypic, it is a sample-by-identifiers matrix. Phenotypic matrices have associated field metadata (descriptive names, codes, etc.). Genotypic matricies may have an associated probeMap, which maps probes to genomic locations. If a matrix has hugo probeMap, the probes themselves are gene names. Otherwise, a probeMap is used to map a gene location to a set of probes.

Value

a matirx or character vector or a list.

Functions

  • fetch_dense_values: fetches values from a dense matrix.

  • fetch_sparse_values: fetches values from a sparse data.frame.

  • fetch_dataset_samples: fetches samples from a dataset

  • fetch_dataset_identifiers: fetches identifies from a dataset.

  • has_probeMap: checks if a dataset has ProbeMap.

Examples

library(UCSCXenaTools)

host <- "https://toil.xenahubs.net"
dataset <- "tcga_RSEM_gene_tpm"
samples <- c("TCGA-02-0047-01", "TCGA-02-0055-01", "TCGA-02-2483-01", "TCGA-02-2485-01")
probes <- c("ENSG00000282740.1", "ENSG00000000005.5", "ENSG00000000419.12")
genes <- c("TP53", "RB1", "PIK3CA")


# Fetch samples
fetch_dataset_samples(host, dataset, 2)
# Fetch identifiers
fetch_dataset_identifiers(host, dataset)
# Fetch expression value by probes
fetch_dense_values(host, dataset, probes, samples, check = FALSE)
# Fetch expression value by gene symbol (if the dataset has probeMap)
has_probeMap(host, dataset)
fetch_dense_values(host, dataset, genes, samples, check = FALSE, use_probeMap = TRUE)


UCSCXenaTools documentation built on June 20, 2022, 9:05 a.m.