getTCGA: Get TCGA Data.
In TCGA2STAT: Simple TCGA Data Access for Integrated Statistical Analysis in R

Description Usage Arguments Details Value Examples

Obtain TCGA data from the Broad GDAC Firehose and process the data into a format ready for statistical analysis.

1 2	getTCGA(disease = "GBM", data.type = "RNASeq2", type = "", filter = "Y", p = getOption("mc.cores", 2L), clinical = FALSE, cvars = "OS")

`disease`	acronym for cancer type; default to "`GBM`" for glioblastoma multiforme.
`data.type`	genomic data profiling platform; default to "`RNASeq2`" for gene level RNA-Seq data from the second pipeline (RNASeqV2).
`type`	specific type of measurement produced by certain platforms.
`filter`	chromosome to be filtered out during data import; only applicable CNA or CNV data.
`p`	maximum number of processing cores used in parallel processing; default to the value set in "mc.cores" global option or 2.
`clinical`	logical value to indicate if clinical data is to be imported; default to `FALSE`.
`cvars`	clinical covariates to be merged with genomic data; default to "`OS`" for overall survival.

Values for disease include "ACC", "BLCA", "BRCA", "CESC", "CHOL", "COAD", "COADREAD", "DLBC", "ESCA", "FPPP", "GBM", "GBMLGG", "HNSC", "KICH", "KIPAN", "KIRC", "KIRP", "LAML", "LGG", "LIHC", "LUAD", "LUSC", "MESO", "OV", "PAAD", "PCPG", "PRAD", "READ", "SARC", "SKCM", "STAD", "TGCT", "THCA", "THYM", "UCEC", "UCS", and "UVM". Values for data.type include "RNASeq2", "RNASeq", "miRNASeq", "CNA_SNP", "CNV_SNP", "CNA_CGH", "Methylation", "Mutation", "mRNA_Array", and "miRNA_Array". Note that not all combinations are permitted; Appendix A of the package vignette outlines all values of disease and data.type accommodated by TCGA2STAT.

The type parameter should only be used along with these data.type parameters:

RNASeq - "count" for raw read counts (default); "RPKM" for normalized read counts (reads per kilobase per million mapped reads).
miRNASeq - "count" for raw read counts (default); "rpmmm" for normalized read counts.
Mutation - "somatic" for non-silent somatic mutations (default); "all" for all mutations.
Methylation - "27K" platform (default); "450K" platform.
CNA_CGH - "415K" for CGH Custom Microarray 2x415K (default); "244A" for CGH Microarray.
mRNA_Array - "G450" for Agilent 244K Custom Gene Expression G4502A (default); "U133" for Affymetrix Human Genome U133A 2.0 Array; "Huex" for Affymetrix Human Exon 1.0 ST Array.

The Level III RNA-Seq, miRNA-Seq, mRNA-array, and miRNA-array data imported are at gene level, but not the mutation, copy number alterations/variation (CNA/CNV), and methylation data. Our package processes and aggregates the mutation and CNA/CNV data at the gene level. The mutation data imported are in MAF files, where each file contains mutations found for the particular patient, and the number of mutations differs across patients. We filter the mutation data based on status and variant classification and then aggregate the filtered data at the gene level. The Level III CNA/CNV data imported are in segments; therefore we employ the CNTools package to merge the segmented data into gene-level data. The methylation data imported is at probe level where each probe represents a CpG site. As methylation profiles at different CpG sites within the same gene could vary a lot, it would not be biological meaningful to aggregate the probe-level methylation data into gene-level data. We return the methylation data at probe level.

A list containing:

`dat`	a matrix of dimension gene x sample.
`clinical`	a matrix of dimension sample x clinical covariates; `NULL` if `clinical=FALSE`
`merged.dat`	a matrix, which is the merged `dat` and clinical data as specified by `cvars`. Thus, each matrix of size sample x (cvars + gene); `NULL` if `clinical=FALSE` or `cvars` is not a valid name for clinical covariate.

and for methylation data, an additional element:

cpgs

a matrix of dimension cpg sites x 3. The three columns are gene symbol, chromosome, and genomic coordinate for each CpG site. The order of CpG sites in this matrix is the same as the order in dat.

library(TCGA2STAT)
rsem.ov <- getTCGA(disease="OV", data.type="RNASeq2")
rnaseq.ov <- getTCGA(disease="OV", data.type="RNASeq", type="RPKM")
rnaseq_os.ov <- getTCGA(disease="OV", data.type="RNASeq", type="RPKM", clinical=TRUE)