knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-", cache = TRUE )
suppressPackageStartupMessages({ if("absCopyNumber" %in% installed.packages()[,1]) { library(absCopyNumber) } else { devtools::load_all("~/Documents/GitHub/AbsoluteCopyNumber/") } })
The goal of absCopyNumber is to Estimate tumor purity, ploidy and absolute copy numbers from NGS (WGS, WES, Target Sequencing) and Microarray (SNP, aCGH etc.) data, based on statistical method from paper — "AbsCN-seq: a statistical method to estimate tumor purity, ploidy and absolute copy numbers from next-generation sequencing data".
Currently, most of code come from absCNseq package (not maintained), which used for NGS data. I modified some code and extend it for microarray data, like SNP array. The statistical method come from AbsCN-seq
is very robust and easy to use, I think it can definitely handle microarray data.
You can install absCopyNumber from github with:
# install.packages("devtools") devtools::install_github("ShixiangWang/absCopyNumber")
The following I will show a basic example which shows you how to use absCopyNumber.
library(absCopyNumber)
This step should identify local path of copy ratio and snv (optional) files. Here we use example files within package.
example_cn = system.file("extdata", "example.cn.txt.gz", package = "absCopyNumber") example_snv = system.file("extdata", "example.snv.txt.gz", package = "absCopyNumber")
Bacause R can handle zipped files, so gzipped file is okay here. Of note, the input copy number file and somatic mutation file should following some rules:
1. Input copy ratio file ("example.cn.txt") The input copy ratio file contains segmented copy ratio data, which can be generated by popular segmentation algorithms like the "DNAcopy" R package. It should be a tab-delimited text file with five columns. Please use the EXACT header names as below. 1) "chrom" The chromosome number of a segment. Must be a integer number from 1 to 22. 2) "loc.start" The start position of a segment. 3) "loc.end" The end position of a segment. 4) "eff.seg.len" For exome sequencing, due to the nature of highly uneven coverage (zero coverage for introns), this column gives the number of base pairs with actual observed coverage. It can be derived by concatenate all the VARSCAN bins within a segment. Note that this length is usually much smaller than the length of the segment which is loc.end-loc.start+1 5) "normalized.ratio" The mean copy ratio (tumor DNA vs. germline DNA) of a segment. Note that the copy ratio should be normalized to eliminate any sequencing throughput difference between tumor and germline DNA. For example, samtools can be used to count total reads that were properly paired/aligned for tumor and germline DNA. The difference then need to be adjusted accordingly. 2. Input SNV file ("example.snv.txt") The input SNV file contains allele frequency data for somatic mutations. It should be a tab-delimited text file with three columns. Please use the EXACT header names as below. 1) "chrom" The chromosome number of a somatic SNV. Must be a integer number from 1 to 22. 2) "position" The genomic position of a somatic SNV. 3) "tumor_var_freq" The proportion of reads supporting the somatic SNV allele. Must be a fraction number.
The example data come from WES platform, next we specify parameter to run absCopyNumber. We use run_fromLocal()
function to process data specified as file path.
my.res.list <- run_fromLocal(seg.fn = example_cn, snv.fn = example_snv, platform="WES", min.seg.len=200, verbose = TRUE)
The result is a list
contains all solutions and other orignal and model information. Solution result is a data.frame named searchRes
in list.
knitr::kable(my.res.list$searchRes, align = 'c', caption = "Solution Data.Frame")
Generally we use top 1 solution, so the corresponding result has been stored at absCN
data.frame of my.res.list
. We can also mannually select best solution using our own knowledge. After selecting solution, we calculate absolute copy number by get_absCopyNumber
function based on purity $\alpha$ and ploidy $\tau$.
# select i-th solution i = 1 seg.CN <- get_absCopyNumber(my.res.list$seg.dat, my.res.list$searchRes[i,"alpha"], my.res.list$searchRes[i,"tau"])
Last, we plot orignal copy number (Raw) and absolute copy number (Absolute) on a same figure.
plot_absCopyNumber(seg.CN, chromnum=1)
Sometimes, we process data in R console, so absCopyNumber provide function run_fromDF
to substitute run_fromLocal
. Similar arguments and procedure as above, thus we do not show details here.
cn_df = read.table(example_cn, sep = "\t", header = TRUE, stringsAsFactors = FALSE) snv_df = read.table(example_snv, sep = "\t", header = TRUE, stringsAsFactors = FALSE)
Run.
my.res.list2 = run_fromDF(seg.df = cn_df, snv.df = snv_df, platform = "WES", min.seg.len=200) identical(my.res.list, my.res.list2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.