library(knitr)
opts_chunk$set(
  comment = "",
  message = FALSE,
  warning = FALSE,
  tidy.opts = 
    list(keep.blank.line = TRUE,
         width.cutoff = 150),
  options(width = 120),
  eval = TRUE)
library(RTCGA)

Available Datasets

Before you read this section, please make sure you are aware of that we have prepared some datasets for you in below packages. browseVignettes is a function.

pckg.names <- c("RTCGA.rnaseq", "RTCGA.clinical", "RTCGA.mutations",
                                "RTCGA.mRNA", "RTCGA.miRNASeq", "RTCGA.PANCAN12",
                                "RTCGA.RPPA", "RTCGA.CNV", "RTCGA.methylation",
                                "RTCGA.rnaseq.20160128", "RTCGA.clinical.20160128", 
                                "RTCGA.mutations.20160128",
                                "RTCGA.mRNA.20160128", "RTCGA.miRNASeq.20160128",
                                "RTCGA.RPPA.20160128",
                                "RTCGA.CNV.20160128", "RTCGA.methylation.20160128")

pckg.install <- c("`biocLite(\'RTCGA.rnaseq')`",
                                 "`biocLite(\'RTCGA.clinical\')`",
                                 "`biocLite(\'RTCGA.mutations\')`",
                                 "`biocLite(\'RTCGA.mRNA\')`",
                                 "`biocLite(\'RTCGA.miRNASeq\')`",
                                 "`biocLite(\'RTCGA.PANCAN12\')`",
                                 "`biocLite(\'RTCGA.RPPA\')`",
                                 "`biocLite(\'RTCGA.CNV\')`",
                                 "`biocLite(\'RTCGA.methylation\')`",
                                 "`biocLite(\'RTCGA.rnaseq.20160128')`",
                                 "`biocLite(\'RTCGA.clinical.20160128\')`",
                                 "`biocLite(\'RTCGA.mutations.20160128\')`",
                                 "`biocLite(\'RTCGA.mRNA.20160128\')`",
                                 "`biocLite(\'RTCGA.miRNASeq.20160128\')`",
                                 "`biocLite(\'RTCGA.RPPA.20160128\')`",
                                 "`biocLite(\'RTCGA.CNV.20160128\')`",
                                 "`biocLite(\'RTCGA.methylation.20160128\')`")
pckg.help <- c("`?rnaseq`", "`?clinical`", "`?mutations`",
                             "`?mRNA`", "`?miRNASeq`", "`?pancan12`",
                             "`?RPPA`", "`?CNV`", "`?methylation`",
                             "`?rnaseq.20160128`", "`?clinical.20160128`", "`?mutations.20160128`",
                             "`?mRNA.20160128`", "`?miRNASeq.20160128`", 
                             "`?RPPA.20160128`", "`?CNV.20160128`", "`?methylation.20160128`")
pckg.vigg <- c("`\'RTCGA.rnaseq\'`",
                                 "`\'RTCGA.clinical\'`",
                                 "`\'RTCGA.mutations\'`",
                                 "`\'RTCGA.mRNA\'`",
                                 "`\'RTCGA.miRNASeq\'`",
                                 "`\'RTCGA.PANCAN12\'`",
                                 "`\'RTCGA.RPPA\'`",
                                 "`\'RTCGA.CNV\'`",
                                 "`\'RTCGA.methylation\'`",
                                 "`\'RTCGA.rnaseq.20160128\'`",
                                 "`\'RTCGA.clinical.20160128\'`",
                                 "`\'RTCGA.mutations.20160128\'`",
                                 "`\'RTCGA.mRNA.20160128\'`",
                                 "`\'RTCGA.miRNASeq.20160128\'`",
                                 "`\'RTCGA.RPPA.20160128\'`",
                                 "`\'RTCGA.CNV.20160128\'`",
                                 "`\'RTCGA.methylation.20160128\'`")

info_frame <- data.frame(package = pckg.names,
                                                 installation = pckg.install,
                                                 help = pckg.help, 
                                                 `browseVignettes` = pckg.vigg) 
knitr::kable(info_frame)

Cohorts Names and Number of Cases

The Cancer Genome Atlas provides data via Broad GDAC Firehose. The number of cases in the most popular datasets can be checked with the following code that is based on the Broad GDAC Firehose.

library(magrittr)
infoTCGA() %>%
 # select less variables so that tables fits webpage
 dplyr::select(Cohort, BCR, Clinical, Methylation, mRNASeq) %>%
 head() %>% # without that you can see all cohorts
 knitr::kable()

Furthermore infoTCGA() enables to extract possible cohorts names from TCGA Study.

Cohorts' names stand for abbreviations of real names of cancer types.

(cohorts <- infoTCGA() %>% 
rownames() %>% 
   sub('-counts', '', x=.))

Datasets dates of release

The Cancer Genome Atlas provides datasets in many dates of release. You can check them with the following command.

checkTCGA('Dates')

Datasets names for a specific cohort type

The Cancer Genome Atlas provides various datasets for different cohort types. For example you can check all names of datasets provided for BRCA with (second dimension stand for dataset size).

checkTCGA('DataSets',
          cancerType = 'BRCA',
          date = '2016-01-28') %>% dim

This lists only .zip files.

Data Download

If you know which cohort type you are interested in and which dataset name you are looking for and which release date suits you, you can download a dataset provided by TCGA Study with the following command.

dir.create("download_folder")
downloadTCGA(
    cancerTypes = "BRCA",
    dataSet = "Merge_Clinical.Level_1",
    destDir = "download_folder"
)

You can specify cancerTypes as a vector of characters if you would like to download the same dataset type for many cohorts. Moreover you can just specify an abbreviation ora fragment of dataset name. You can also specify date argument if you would like to download datasets from previous (not the newest) releases. All downloaded datasets are untarred and their .tar files are deleted after untarring. You can also change this behaviour with untarFile and removeTar arguments. Sometimes more than one dataset fits the character provieded in dataSet argument, then the first without FPPP string is downloaded if possible. If you are interested in all datasets then you can change allDataSets (by default FALSE) parameter to TRUE.

Read Specific Datasets

For specific datasets it is possible to read downloaded file into the tidy format. Fore more information check

?readTCGA


RTCGA/RTCGA documentation built on Nov. 1, 2022, 8:15 p.m.