Introduction

This vignette introduces you to some core functionality in the oncosnputils R package.

The example data we are using in this package is from a Affymetrix SNP 6.0 CEL file of the breast cancer cell-line HCC1395. The raw CEL file was pre-processed through the PennCNV-Affy protocol and then run through OncoSNP (1.3) with only the SNP probes.

Loading Data

The oncosnputils R package provides several load functions for different input and output files of oncosnp. This functions rely on the data.table fread() function which relies for fast reading and then renames several columns so that they easier to work with in R. For instance, we can load the quality control file:

library("oncosnputils")
qcFile <- system.file("extdata", "HCC1395.qc", package = "oncosnputils")
qcDf <- LoadOncosnpQcFile(qcFile)
knitr::kable(qcDf)

Notice the renaming of some of the columns. This just faciliates downstream analyses as the default output column names are difficult to work with in R. We also load the OncoSNP CNV file and the PennCNV probe file.

# Loading the OncoSNP CNV file
cnvFile <- system.file("extdata", "HCC1395.cnvs", package = "oncosnputils")
cnvDf <- LoadOncosnpCnvFile(cnvFile)

# Show only first 10 rows and 10 columns for vignette purposes
knitr::kable(cnvDf[1:10, 1:10])
# Loading the PennCNV probe file
probeFile <- system.file("extdata", "logR_BAF.snp_probes.txt", package = "oncosnputils")
probeDf <- LoadPennCNVProbeData(probeFile)

# Show only first 10 rows for vignette purposes
knitr::kable(probeDf[1:10, ])

All these loading functions will return data.table objects. These are enhanced data.frames which are suitable for large data analysis.

Post-Processing OncoSNP Results

The oncosnputils R package provides several post-processing functions that will enhance the OncoSNP inputs/outputs. For instance, the standard output (.cnvs) from OncoSNP do not contain the LRR or BAF values of the segments.

head(cnvDf)

We can add this information to these files by using the AddLRRBAF2OncosnpCNV function:

# only add for the first 10 segments for vignette purposes
cnvDf.LRR.BAF <- AddLRRBAF2OncosnpCNV(cnvDf[1:10, ], qcDf, probeDf)

Now the LRR, BAF along with the number of probes in each segment have been added as additional columns to the cnvDf data.frame. The qcDf and probeDf need to be passed in as input as the functions needs to determine the LRR shift and also the overlapping probes wth each CNV segment.

Also, the AddOncosnp2PennCNVProbe can add the OncoSNP segment state information to the PennCNV raw probe input into OncoSNP.

# only add for the first 5000 probes for vignette purposes
probeDf.oncosnp <- AddOncosnp2PennCNVProbe(cnvDf, qcDf, probeDf[1:5000, ])
head(probeDf.oncosnp)


tinyheero/oncosnputils documentation built on May 31, 2019, 3:36 p.m.