UCSCXenaTools: an R package for Accessing Genomics Data from UCSC Xena platform, from Cancer Multi-omics to Single-cell RNA-seq

  collapse = TRUE,
  comment = "#>"

UCSCXenaTools is an R package for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. Public omics data from UCSC Xena are supported through multiple turn-key Xena Hubs, which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded.

Who is the target audience and what are scientific applications of this package?


Install stable release from CRAN with:


You can also install devel version of UCSCXenaTools from github with:

# install.packages("remotes")

If you want to build vignette in local, please add two options:

remotes::install_github("ropensci/UCSCXenaTools", build_vignettes = TRUE, dependencies = TRUE)

The minimum versions to run the vignette is 1.2.4. GitHub Issue is a place for discussing any problem.

Data Hub List

All datasets are available at https://xenabrowser.net/datapages/.

Currently, UCSCXenaTools supports the following data hubs of UCSC Xena.

Users can update dataset list from the newest version of UCSC Xena by hand with XenaDataUpdate() function, followed by restarting R and library(UCSCXenaTools).

If any url of data hub is changed or a new data hub is online, please remind me by emailing to w_shixiang@163.com or opening an issue on GitHub.


Download UCSC Xena datasets and load them into R by UCSCXenaTools is a workflow with generate, filter, query, download and prepare 5 steps, which are implemented as XenaGenerate, XenaFilter, XenaQuery, XenaDownload and XenaPrepare functions, respectively. They are very clear and easy to use and combine with other packages like dplyr.

To show the basic usage of UCSCXenaTools, we will download clinical data of LUNG, LUAD, LUSC from TCGA (hg19 version) data hub.

XenaData data.frame

UCSCXenaTools uses a data.frame object (built in package) XenaData to generate an instance of XenaHub class, which records information of all datasets of UCSC Xena Data Hubs.

You can load XenaData after loading UCSCXenaTools into R.




Select datasets.

# The options in XenaFilter function support Regular Expression
XenaGenerate(subset = XenaHostNames=="tcgaHub") %>% 
  XenaFilter(filterDatasets = "clinical") %>% 
  XenaFilter(filterDatasets = "LUAD|LUSC|LUNG") -> df_todo


Sometimes we only know some keywords, XenaScan() can be used to scan all rows to detect if the keywords exist in XenaData.

x1 = XenaScan(pattern = 'Blood')
x2 = XenaScan(pattern = 'LUNG', ignore.case = FALSE)

x1 %>%
x2 %>%

Query and download.

XenaQuery(df_todo) %>%
  XenaDownload() -> xe_download

For researchers in China, now Hiplot team has deployed several Xena mirror sites (https://xena.hiplot.com.cn/) at Shanghai. You can set an option options(use_hiplot = TRUE) before querying data step to speed up both data querying and downloading.

options(use_hiplot = TRUE)

XenaQuery(df_todo) %>%
  XenaDownload() -> xe_download

Prepare data into R for analysis.

cli = XenaPrepare(xe_download)

Browse datasets

Create two XenaHub objects:

XenaGenerate(subset = XenaHostNames=="tcgaHub") %>%
    XenaFilter(filterDatasets = "clinical") %>%
    XenaFilter(filterDatasets = "LUAD") -> to_browse


XenaGenerate(subset = XenaHostNames=="tcgaHub") %>%
    XenaFilter(filterDatasets = "clinical") %>%
    XenaFilter(filterDatasets = "LUAD|LUSC") -> to_browse2


XenaBrowse() function can be used to browse dataset/cohort links using your default web browser. At default, this function limits one dataset/cohort for preventing user to open too many links at once.

# This will open you web browser

XenaBrowse(to_browse, type = "cohort")
# This will throw error

XenaBrowse(to_browse2, type = "cohort")

When you make sure you want to open multiple links, you can set multiple option to TRUE.

XenaBrowse(to_browse2, multiple = TRUE)
XenaBrowse(to_browse2, type = "cohort", multiple = TRUE)

More usages

The core functionality has been described above. I write more usages about this package in my website but not here because sometimes package check will fail due to internet problem.

Read Obtain RNAseq Values for a Specific Gene in Xena Database to see how to get values for single gene. A use case for survival analysis based on single gene expression has been published on rOpenSci, please read UCSCXenaTools: Retrieve Gene Expression and Clinical Information from UCSC Xena for Survival Analysis.


How to resume file from breakpoint

Thanks to the UCSC Xena team, the new feature 'resume from breakpoint' is added and can be done by XenaDownload() with the method and extra flags specified.

Of note, the corresponding wget or curl command must be installed by your OS and can be found by R.

The folliwng code gives a test example, the data can be viewed on web page.

xe = XenaGenerate(subset = XenaDatasets == "TcgaTargetGtex_expected_count")
xq = XenaQuery(xe)
# You cannot resume from breakpoint in default mode
XenaDownload(xq, destdir = "~/test/", force = TRUE)
# You can do it with 'curl' command
XenaDownload(xq, destdir = "~/test/", method = "curl", extra = "-C -", force = TRUE)
# You can do it with 'wget' command
XenaDownload(xq, destdir = "~/test/", method = "wget", extra = "-c", force = TRUE)


Cite me by the following paper.

Wang et al., (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data
  from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. 
  Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627

# For BibTex

    journal = {Journal of Open Source Software},
    doi = {10.21105/joss.01627},
    issn = {2475-9066},
    number = {40},
    publisher = {The Open Journal},
    title = {The UCSCXenaTools R package: a toolkit for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq},
    url = {http://dx.doi.org/10.21105/joss.01627},
    volume = {4},
    author = {Wang, Shixiang and Liu, Xuesong},
    pages = {1627},
    date = {2019-08-05},
    year = {2019},
    month = {8},
    day = {5},

Cite UCSC Xena by the following paper.

Goldman, Mary, et al. "The UCSC Xena Platform for cancer genomics data 
    visualization and interpretation." BioRxiv (2019): 326470.


This package is based on XenaR, thanks Martin Morgan for his work.

Try the UCSCXenaTools package in your browser

Any scripts or data that you put into this service are public.

UCSCXenaTools documentation built on Sept. 15, 2021, 5:07 p.m.