README.md

curatedTCGAData

BioC status Downloads

Overview

curatedTCGAData is an experiment data package in both release and development versions of Bioconductor. It makes use of ExperimentHub to access pre-processed and curated data from The Cancer Genome Atlas (TCGA) as MultiAssayExperiment objects.

Clinical Curation

The clinical datasets taken from TCGA include a number of variables including demographic and pathology variables. Curation was done to merge additional level one data and subtype information. Any empty variables were removed and their names were saved in the colData metadata. Ongoing efforts include merging the different levels of variables in the colData and thus reducing the repetition of some clinical variables.

Subtype Curation

Among the different TCGA cohorts (n = 33) there were various molecular subtypes detailed (methylation, mRNA, etc.) in the primary publications. Currently, no publicly available datasets contain clinical subtype information. As such, we have integrated both clinical and molecular subtype information by curating the clinical variables as detailed above and incorporating subtype information from the supplements of the primary publications. All subtype curation was done by hand and where supplemental information was not available in a publication the coresponding author was emailed and asked to provide it. With the addition of the molecular subtype information it becomes possible to examine subtype characterization across cohorts and will hopeful provide deeper insight into oncogenisis.

Genome versions

See the NCI wiki and summary on FireHose for information on genome builds for all aligned data types.

Getting Started

Install curatedTCGAData from Bioconductor using BiocManager:

if (!require("BiocManager"))
    install.packages("BiocManager")

library(BiocManager)

install(version = "devel")
install("curatedTCGAData")

browseVignettes("curatedTCGAData")

Reporting Bugs

We appreciate all feedback to our experiment data package. Please file an issue on GitHub and we will get to it ASAP.



waldronlab/curatedTCGAData documentation built on Feb. 7, 2024, 1:12 p.m.