Introduction


This vignette introduces you to some core functionality in the oncosnputils R package.

The example data we are using in this package is from a Affymetrix SNP 6.0 CEL file of the breast cancer cell-line HCC1395. The raw CEL file was pre-processed through the PennCNV-Affy protocol and then run through OncoSNP (1.3) with only the SNP probes.

Loading Data


The oncosnputils R package provides several load functions for different input and output files of oncosnp. This functions rely on the data.table::fread function which relies for fast reading and then renames several columns so that they easier to work with in R. For instance, we can load the quality control file:

library("oncosnputils")
library("dplyr")

qc.file <- system.file("extdata", "HCC1395.qc", package = "oncosnputils")
qc.df <- load_oncosnp_qc_file(qc.file)
kable(qc.df)

Notice the renaming of some of the columns. This just faciliates downstream analyses as the default output column names are difficult to work with in R. We also load the OncoSNP CNV file and the PennCNV probe file.

# Loading the OncoSNP CNV file
cnv.file <- system.file("extdata", "HCC1395.cnvs", package = "oncosnputils")
cnv.df <- load_oncosnp_cnv_file(cnv.file)

# Show only first 10 rows and 10 columns for vignette purposes
kable(cnv.df[1:10, 1:10])
# Loading the PennCNV probe file
probe.file <- system.file("extdata", "logR_BAF.snp_probes.txt", package = "oncosnputils")
probe.df <- load_penncnv_probe_data(probe.file)

# Show only first 10 rows for vignette purposes
kable(probe.df[1:10, ])

All these loading functions will return data.table objects. These are enhanced data.frames which are suitable for large data analysis.

Post-Processing OncoSNP Results


The oncosnputils R package provides several post-processing functions that will enhance the OncoSNP inputs/outputs. For instance, the standard output (.cnvs) from OncoSNP do not contain the LRR or BAF values of the segments.

head(cnv.df)

We can add this information to these files by using the add_LRR_BAF_to_oncosnp_cnv function:

# only add for the first 10 segments for vignette purposes
cnv.df.LRR.BAF <- add_LRR_BAF_to_oncosnp_cnv(cnv.df[1:10, ], qc.df, probe.df)

Now the LRR, BAF along with the number of probes in each segment have been added as additional columns to the cnv.df data.frame. The qc.df and probe.df need to be passed in as input as the functions needs to determine the LRR shift and also the overlapping probes wth each CNV segment.

Also, the add_oncosnp_to_penncnv_probe can add the OncoSNP segment state information to the PennCNV raw probe input into OncoSNP.

# only add for the first 5000 probes for vignette purposes
probe.df.oncosnp <- add_oncosnp_to_penncnv_probe(cnv.df, qc.df, probe.df[1:5000, ])
head(probe.df.oncosnp)

Plotting OncoSNP Results

library("reshape2")
rawProbeDt.melt <- 
    probeDf.oncosnp %>% melt(id.vars = c("probeID", "chr", "pos", "finalState.modified"), 
             measure.vars = c("logRShifted", "baf"), 
             variable.name = "cnMeasure")


rawProbeDt.melt %>%
    filter(chr == 1, pos > 50000000, pos < 100000000) %>% 
    ggplot(aes(pos, value, color = factor(finalState.modified))) + 
    geom_point(shape = 1) + 
    facet_grid(cnMeasure ~ ., scales = "free", labeller = facet_labeller) + 
    scale_color_manual(name = "Tumour State", values = states.col) +
    xlab("Position") + 
    ylab("")


tinyheero/oncosnputils documentation built on May 31, 2019, 3:36 p.m.