title: "Creating Neuroblastoma Input Matrices from public GDC (TARGET) data" author: "James Dalgleish" date: "August 1, 2018" output: rmarkdown::html_vignette vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Creating the Input matrix from public data} %\VignetteEncoding{UTF-8}


knitr::opts_chunk$set(echo = TRUE)

knitr::opts_knit$set(root.dir = '.')
library(HiCNV)

We have to begin our work by loading the library:

library(HiCNV)

Following this, we'll obtain TARGET low-pass neuroblastoma data (NBL) from the GDC archive. Please note: TARGET_NBL_WGS_CNVLOH.tsv is a clinical metadata file and therefore not compatible with the ensuing functions to extract segment data. We have also chosen to use only a single comparison type (NormalVsPrimary) to ensure comparability and compatibility with the data. Users can download the tar.gz file and remove the tsv files into a single folder. We have already done that here. The source for these files is located here The user simply chooses to add all the files to the cart, then click the black cart button in the top right hand corner. On the cart page, click download, then cart. It will be downloaded as a tar.gz archive.

You can untar it with R, but the files will be in a complex set of directories. It is best to list the files recursively with criteria that will obtain the segment files in tsv format, with that single comparison of interest.

if(!dir.exists("extracted_nbl_data")){dir.create("extracted_nbl_data")}
untar("gdc_download_20180801_160142.tar.gz",exdir = "extracted_nbl_data")
tcga_files_nbl<-list.files(path = "extracted_nbl_data",pattern=glob2rx("*NormalVsPrimary.tsv"),recursive=T,full.names = T)
print(tcga_files_nbl)

With the full list of input files from the GDC, these can then be simply loaded into a function that will read all of them, sample match them, and aggregate the data into a bin-sample matrix. This matrix can then be saved into the fast, space efficient, RDS filetype.

sample_aggregated_segvals_output_full<-formSampleMatrixFromRawGDCData(tcga_files = tcga_files_nbl,format = "TARGET")
saveRDS(sample_aggregated_segvals_output_full,"NBL_sample_matched_input_matrix.rds")


jamesdalg/HiCNV documentation built on May 9, 2019, 5:05 a.m.