if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("curatedTCGAData")
Load packages:
library(curatedTCGAData) library(MultiAssayExperiment) library(TCGAutils)
curatedTCGAData
Your citations are important to the project and help us secure funding. Please
refer to the References section to see the @Ramos2020-ya and
@Ramos2017-og citations. For the BibTeX entries, run the citation
function
(after installation):
citation("curatedTCGAData") citation("MultiAssayExperiment")
curatedTCGAData
uses MultiAssayExperiment
to coordinate and represent the
data. Please cite the MultiAssayExperiment
Cancer Research publication.
You can see the PDF of our public data publication on
JCO Clinical Cancer Informatics.
curatedTCGAData
now has a version 2.0.1
set of data with a number of
improvements and bug fixes. To access the previous data release,
please use version 1.1.38
. This can be added to the
curatedTCGAData
function as:
head( curatedTCGAData( diseaseCode = "COAD", assays = "*", version = "1.1.38" ) )
Here is a list of changes to the data provided in version 2.0.1
RNASeq2Gene
assays with RSEM gene expression valuesRaggedExperiment
objects as GRCh37
rather than 37
OV
and GBM
)mRNAArray
data now returns matrix
data instead of DataFrame
These data were processed by Broad Firehose pipelines and accessed using
r BiocStyle::Biocpkg("RTCGAToolbox")
. For details on the preprocessing
methods used, see the
Broad GDAC documentation.
To get a neat table of cancer data available in curatedTCGAData
, see
the diseaseCodes
dataset from TCGAutils
. Availability is indicated by the
Available
column in the dataset.
data('diseaseCodes', package = "TCGAutils") head(diseaseCodes)
Alternatively, you can get the full table of data available using wildcards
'*'
in the diseaseCode
argument of the main function:
curatedTCGAData( diseaseCode = "*", assays = "*", version = "2.0.1" )
To see what assays are available for a particular TCGA disease code, leave the
assays
argument as a wildcard ('*'
):
head( curatedTCGAData( diseaseCode = "COAD", assays = "*", version = "2.0.1" ) )
Not all TCGA samples are cancer, there are a mix of samples in each of the
33 cancer types. Use sampleTables
on the MultiAssayExperiment
object
along with data(sampleTypes, package = "TCGAutils")
to see what samples are
present in the data. There may be tumors that were used to create multiple
contributions leading to technical replicates. These should be resolved using
the appropriate helper functions such as mergeReplicates
. Primary tumors
should be selected using TCGAutils::TCGAsampleSelect
and used as input
to the subsetting mechanisms. See the "Samples in Assays" section of this
vignette.
(accmae <- curatedTCGAData( "ACC", c("CN*", "Mutation"), version = "2.0.1", dry.run = FALSE ))
Note. For more on how to use a MultiAssayExperiment
please see the
MultiAssayExperiment
vignette.
Some cancer datasets contain associated subtype information within the
clinical datasets provided. This subtype information is included in the
metadata of colData
of the MultiAssayExperiment
object. To obtain these
variable names, use the getSubtypeMap
function from TCGA utils:
head(getSubtypeMap(accmae))
Another helper function provided by TCGAutils allows users to obtain a set
of consistent clinical variable names across several cancer types.
Use the getClinicalNames
function to obtain a character vector of common
clinical variables such as vital status, years to birth, days to death, etc.
head(getClinicalNames("ACC")) colData(accmae)[, getClinicalNames("ACC")][1:5, 1:5]
The sampleTables
function gives an overview of sample types / codes
present in the data:
sampleTables(accmae)
You can use the reference dataset (sampleTypes
) from the TCGAutils
package
to interpret the TCGA sample codes above. The dataset provides clinically
meaningful descriptions:
data(sampleTypes, package = "TCGAutils") head(sampleTypes)
Often, an analysis is performed comparing two groups of samples to each other.
To facilitate the separation of samples, the TCGAsplitAssays
function from
TCGAutils
identifies all sample types in the assays and moves each into its
own assay. By default, all discoverable sample types are separated into a
separate experiment. In this case we requested only solid tumors and blood
derived normal samples as seen in the sampleTypes
reference dataset:
sampleTypes[sampleTypes[["Code"]] %in% c("01", "10"), ] TCGAsplitAssays(accmae, c("01", "10"))
To obtain a logical vector that could be used for subsetting a
MultiAsssayExperiment
, refer to TCGAsampleSelect
. To select only primary
tumors, use the function on the colnames of the MultiAssayExperiment
:
tums <- TCGAsampleSelect(colnames(accmae), "01")
If interested in only the primary tumor samples, TCGAutils
provides a
convenient operation to extract primary tumors from the MultiAssayExperiment
representation. The TCGAprimaryTumors
function will return only
samples with primary tumor (either solid tissue or blood) samples using
the above operations in the background:
(primaryTumors <- TCGAprimaryTumors(accmae))
To view the results, run sampleTables
again on the output:
sampleTables(primaryTumors)
When extracting a single assay from the MultiAssayExperiment
the user
can conveniently choose to keep the colData
from the MultiAssayExperiment
in the extracted assay given that the class of the extracted assay
supports colData
storage and operations. SummarizedExperiment
and its
derived data representations support this operation. In this example, we
extract the mutation data as represented by a RaggedExperiment
which was
designed to have colData
functionality. The default replacement method
is to 'append' the MultiAssayExperiment
colData
to the RaggedExperiment
assay. The mode
argument can also completely replace the colData
when set
to 'replace'.
(accmut <- getWithColData(accmae, "ACC_Mutation-20160128", mode = "append")) head(colData(accmut)[, 1:4])
The RaggedExperiment representation provides a matrix view of a GRangesList
internal representation. Typical use of a RaggedExperiment involves a number of
functions to reshape 'ragged' measurements into a matrix-like format. These
include sparseAssay
, compactAssay
, disjoinAssay
, and qreduceAssay
.
See the RaggedExperiment
vignette for details. In this example, we convert
entrez gene identifiers to numeric in order to show how we can create a sparse
matrix representation of any numeric metadata column in the RaggedExperiment
.
ragex <- accmae[["ACC_Mutation-20160128"]] ## convert score to numeric mcols(ragex)$Entrez_Gene_Id <- as.numeric(mcols(ragex)[["Entrez_Gene_Id"]]) sparseAssay(ragex, i = "Entrez_Gene_Id", sparse=TRUE)[1:6, 1:3]
Users who would like to use the internal GRangesList
representation can
invoke the coercion method:
as(ragex, "GRangesList")
MultiAssayExperiment provides users with an integrative representation of multi-omic TCGA data at the convenience of the user. For those users who wish to use alternative environments, we have provided an export function to extract all the data from a MultiAssayExperiment instance and write them to a series of files:
td <- tempdir() tempd <- file.path(td, "ACCMAE") if (!dir.exists(tempd)) dir.create(tempd) exportClass(accmae, dir = tempd, fmt = "csv", ext = ".csv")
This works for all data classes stored (e.g., RaggedExperiment
, HDF5Matrix
,
SummarizedExperiment
) in the MultiAssayExperiment
via the assays
method
which converts classes to matrix
format (using individual assay
methods).
Click here to expand
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.