getTCGA: Get TCGA Data.

Description Usage Arguments Details Value Examples

View source: R/TCGA2STAT.R


Obtain TCGA data from the Broad GDAC Firehose and process the data into a format ready for statistical analysis.


getTCGA(disease = "GBM", data.type = "RNASeq2", type = "", filter = "Y",  
		p = getOption("mc.cores", 2L), clinical = FALSE, cvars = "OS")



acronym for cancer type; default to "GBM" for glioblastoma multiforme.


genomic data profiling platform; default to "RNASeq2" for gene level RNA-Seq data from the second pipeline (RNASeqV2).


specific type of measurement produced by certain platforms.


chromosome to be filtered out during data import; only applicable CNA or CNV data.


maximum number of processing cores used in parallel processing; default to the value set in "mc.cores" global option or 2.


logical value to indicate if clinical data is to be imported; default to FALSE.


clinical covariates to be merged with genomic data; default to "OS" for overall survival.


Values for disease include "ACC", "BLCA", "BRCA", "CESC", "CHOL", "COAD", "COADREAD", "DLBC", "ESCA", "FPPP", "GBM", "GBMLGG", "HNSC", "KICH", "KIPAN", "KIRC", "KIRP", "LAML", "LGG", "LIHC", "LUAD", "LUSC", "MESO", "OV", "PAAD", "PCPG", "PRAD", "READ", "SARC", "SKCM", "STAD", "TGCT", "THCA", "THYM", "UCEC", "UCS", and "UVM". Values for data.type include "RNASeq2", "RNASeq", "miRNASeq", "CNA_SNP", "CNV_SNP", "CNA_CGH", "Methylation", "Mutation", "mRNA_Array", and "miRNA_Array". Note that not all combinations are permitted; Appendix A of the package vignette outlines all values of disease and data.type accommodated by TCGA2STAT.

The type parameter should only be used along with these data.type parameters:

The Level III RNA-Seq, miRNA-Seq, mRNA-array, and miRNA-array data imported are at gene level, but not the mutation, copy number alterations/variation (CNA/CNV), and methylation data. Our package processes and aggregates the mutation and CNA/CNV data at the gene level. The mutation data imported are in MAF files, where each file contains mutations found for the particular patient, and the number of mutations differs across patients. We filter the mutation data based on status and variant classification and then aggregate the filtered data at the gene level. The Level III CNA/CNV data imported are in segments; therefore we employ the CNTools package to merge the segmented data into gene-level data. The methylation data imported is at probe level where each probe represents a CpG site. As methylation profiles at different CpG sites within the same gene could vary a lot, it would not be biological meaningful to aggregate the probe-level methylation data into gene-level data. We return the methylation data at probe level.


A list containing:


a matrix of dimension gene x sample.


a matrix of dimension sample x clinical covariates; NULL if clinical=FALSE


a matrix, which is the merged dat and clinical data as specified by cvars. Thus, each matrix of size sample x (cvars + gene); NULL if clinical=FALSE or cvars is not a valid name for clinical covariate.

and for methylation data, an additional element:


a matrix of dimension cpg sites x 3. The three columns are gene symbol, chromosome, and genomic coordinate for each CpG site. The order of CpG sites in this matrix is the same as the order in dat.


rsem.ov <- getTCGA(disease="OV", data.type="RNASeq2")
rnaseq.ov <- getTCGA(disease="OV", data.type="RNASeq", type="RPKM")
rnaseq_os.ov <- getTCGA(disease="OV", data.type="RNASeq", type="RPKM", clinical=TRUE)

TCGA2STAT documentation built on May 29, 2017, 8:30 p.m.