This vignette introduces you to some core functionality in the oncosnputils R package.
The example data we are using in this package is from a Affymetrix SNP 6.0 CEL file of the breast cancer cell-line HCC1395. The raw CEL file was pre-processed through the PennCNV-Affy protocol and then run through OncoSNP (1.3) with only the SNP probes.
The oncosnputils R package provides several load functions for different input and output files of oncosnp. This functions rely on the data.table::fread
function which relies for fast reading and then renames several columns so that they easier to work with in R. For instance, we can load the quality control file:
library("oncosnputils") library("dplyr") qc.file <- system.file("extdata", "HCC1395.qc", package = "oncosnputils") qc.df <- load_oncosnp_qc_file(qc.file) kable(qc.df)
Notice the renaming of some of the columns. This just faciliates downstream analyses as the default output column names are difficult to work with in R. We also load the OncoSNP CNV file and the PennCNV probe file.
# Loading the OncoSNP CNV file cnv.file <- system.file("extdata", "HCC1395.cnvs", package = "oncosnputils") cnv.df <- load_oncosnp_cnv_file(cnv.file) # Show only first 10 rows and 10 columns for vignette purposes kable(cnv.df[1:10, 1:10])
# Loading the PennCNV probe file probe.file <- system.file("extdata", "logR_BAF.snp_probes.txt", package = "oncosnputils") probe.df <- load_penncnv_probe_data(probe.file) # Show only first 10 rows for vignette purposes kable(probe.df[1:10, ])
All these loading functions will return data.table objects. These are enhanced data.frames which are suitable for large data analysis.
The oncosnputils R package provides several post-processing functions that will enhance the OncoSNP inputs/outputs. For instance, the standard output (.cnvs) from OncoSNP do not contain the LRR or BAF values of the segments.
head(cnv.df)
We can add this information to these files by using the add_LRR_BAF_to_oncosnp_cnv
function:
# only add for the first 10 segments for vignette purposes cnv.df.LRR.BAF <- add_LRR_BAF_to_oncosnp_cnv(cnv.df[1:10, ], qc.df, probe.df)
Now the LRR, BAF along with the number of probes in each segment have been added as additional columns to the cnv.df
data.frame. The qc.df and probe.df need to be passed in as input as the functions needs to determine the LRR shift and also the overlapping probes wth each CNV segment.
Also, the add_oncosnp_to_penncnv_probe
can add the OncoSNP segment state information to the PennCNV raw probe input into OncoSNP.
# only add for the first 5000 probes for vignette purposes probe.df.oncosnp <- add_oncosnp_to_penncnv_probe(cnv.df, qc.df, probe.df[1:5000, ]) head(probe.df.oncosnp)
library("reshape2") rawProbeDt.melt <- probeDf.oncosnp %>% melt(id.vars = c("probeID", "chr", "pos", "finalState.modified"), measure.vars = c("logRShifted", "baf"), variable.name = "cnMeasure") rawProbeDt.melt %>% filter(chr == 1, pos > 50000000, pos < 100000000) %>% ggplot(aes(pos, value, color = factor(finalState.modified))) + geom_point(shape = 1) + facet_grid(cnMeasure ~ ., scales = "free", labeller = facet_labeller) + scale_color_manual(name = "Tumour State", values = states.col) + xlab("Position") + ylab("")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.