In RTCGA/RTCGA: The Cancer Genome Atlas Data Integration

library(knitr)
opts_chunk$set(
    comment = "",
    message = FALSE,
    warning = FALSE,
    tidy.opts = list(
        keep.blank.line = TRUE,
        width.cutoff= 150),
    options(width= 150),
    eval = TRUE
)

RTCGA package

The Cancer Genome Atlas (TCGA) Data Portal provides a platform for researchers to search, download, and analyze data sets generated by TCGA. It contains clinical information, genomic characterization data, and high level sequence analysis of the tumor genomes. The key is to understand genomics to improve cancer care.

RTCGA package offers download and integration of the variety and volume of TCGA data using patient barcode key, what enables easier data possession. This may have a benefcial infuence on development of science and improvement of patients' treatment. RTCGA is an open-source R package, available to download from Bioconductor

# source("http://bioconductor.org/biocLite.R")
# biocLite("RTCGA")

or from github

# if (!require(devtools)) {
#    install.packages("devtools")
#    require(devtools)
# }
# biocLite("RTCGA/RTCGA")

Furthermore, RTCGA package transforms TCGA data into form which is convenient to use in R statistical package. Those data transformations can be a part of statistical analysis pipeline which can be more reproducible with RTCGA.

Use cases and examples are shown in RTCGA packages vignettes:

# browseVignettes("RTCGA")

How to download `r name` data to gain the same datasets as in RTCGA.`r name`.`r gsub('-', '', releaseDate)` package?

There are many available date times of TCGA data releases. To see them all just type:

library(RTCGA)
library(magrittr)
# checkTCGA('Dates')

Version r gsub('-', '', releaseDate) of RTCGA.r name.r gsub('-', '', releaseDate) package contains r name datasets which were released r releaseDate. They were downloaded in the following way (which is mainly copied from http://rtcga.github.io/RTCGA/:

Available cohorts

All cohort names can be checked using:

(cohorts <- infoTCGA() %>% 
   rownames() %>% 
   sub("-counts", "", x=.))

For all cohorts the following code downloads the r name data.

Downloading `r name` files

dir.create( "data2" ) # name of a directory in which data will be stored
sapply( cohorts, function(element){
tryCatch({
downloadTCGA( cancerTypes = element, 
                            dataSet = dataSet,
              destDir = "data2", 
              date = releaseDate )},
error = function(cond){
   cat("Error: Maybe there weren't", name, " data for ", element, " cancer.\n")
}
)
})

Reading downloaded `r name` dataset

Shortening paths and directories

list.files( "data2") %>% 
   file.path( "data2", .) %>%
   file.rename( to = substr(.,start=1,stop=50))

Removing `NA` files from data2 folder

If there were no r name data for some cohorts we should remove corresponding NA files.

list.files( "data2") %>%
   file.path( "data2", .) %>%
   sapply(function(x){
      if (x == "data2/NA")
         file.remove(x)      
   })

Paths to `r name` data

Below is the code that automatically assigns paths to files for all r name files for all available cohorts types downloaded to data2 folder.

cohorts %>%
    sapply(function(z){
        list.files("data2") %>%
            file.path("data2", .) %>%
            grep(paste0("_",z,"\\."), x = ., value = TRUE) %>%
            file.path(., list.files(.)) %>%
            grep("dataSetFile", x = ., value = TRUE) %>%
            assign(value = .,
                         x = paste0(z, ".",name,".path"),
                         envir = .GlobalEnv)
    })

Reading `r name` data using `readTCGA`

Because of the fact that r name data are transposed in downloaded files, there has been prepared special function readTCGA to read (with data.table::fread) and transpose data automatically. Code is below

ls() %>%
   grep(paste0(name,"\\.path"), x = ., value = TRUE) %>% 
   sapply(function(element){
      tryCatch({
         readTCGA(get(element, envir = .GlobalEnv),
               dataType = "dataType") -> read_file

             ## remove non-ASCII strings:
             for( i in 1:ncol(read_file)){
               read_file[, i] <- iconv(read_file[, i],
                                            "UTF-8", "ASCII", sub="")
             } 

         assign(value = read_file,
                x = sub("\\.path", "", x = element),
                envir = .GlobalEnv )
      }, error = function(cond){
        cat(element)
      })
     invisible(NULL)
    }    
)

Saving `r name` data to RTCGA.`r name`.`r gsub('-', '', releaseDate)` package

grep( name, ls(), value = TRUE) %>%
   grep("path", x=., value = TRUE, invert = TRUE) %>%
   paste0( collapse="," ) -> use_data_input
   # ...    Unquoted names of existing objects to save
eval(parse(text=
   paste0("devtools::use_data(",use_data_input,",pkg='../',compress=\"xz\")")
))

RTCGA/RTCGA documentation built on Nov. 1, 2022, 8:15 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

RTCGA/RTCGA
The Cancer Genome Atlas Data Integration

In RTCGA/RTCGA: The Cancer Genome Atlas Data Integration

RTCGA package

How to download `r name` data to gain the same datasets as in RTCGA.`r name`.`r gsub('-', '', releaseDate)` package?

Available cohorts

Downloading `r name` files

Reading downloaded `r name` dataset

Shortening paths and directories

Removing `NA` files from data2 folder

Paths to `r name` data

Reading `r name` data using `readTCGA`

Saving `r name` data to RTCGA.`r name`.`r gsub('-', '', releaseDate)` package

R Package Documentation

Browse R Packages

We want your feedback!

RTCGA/RTCGA The Cancer Genome Atlas Data Integration

In RTCGA/RTCGA: The Cancer Genome Atlas Data Integration

RTCGA package

How to download r name data to gain the same datasets as in RTCGA.r name.r gsub('-', '', releaseDate) package?

Available cohorts

Downloading r name files

Reading downloaded r name dataset

Shortening paths and directories

Removing NA files from data2 folder

Paths to r name data

Reading r name data using readTCGA

Saving r name data to RTCGA.r name.r gsub('-', '', releaseDate) package

R Package Documentation

Browse R Packages

We want your feedback!

RTCGA/RTCGA
The Cancer Genome Atlas Data Integration

How to download `r name` data to gain the same datasets as in RTCGA.`r name`.`r gsub('-', '', releaseDate)` package?

Downloading `r name` files

Reading downloaded `r name` dataset

Removing `NA` files from data2 folder

Paths to `r name` data

Reading `r name` data using `readTCGA`

Saving `r name` data to RTCGA.`r name`.`r gsub('-', '', releaseDate)` package