readSnpMatrix: Read in the SNP read count matrix file
In mskcc/facets: Cellular Fraction and Copy Numbers from Tumor Sequencing

Description Usage Arguments Details Value

Reads a snp read count matrix generated by the snp-pileup code included and prepares the read counts data frame needed for preProcSample

1 2	readSnpMatrix(filename, skip=0L, err.thresh=Inf, del.thresh=Inf, perl.pileup=FALSE)

`filename`	absolute or relative path of the data file.
`skip`	number of lines to skip. Defaults is none.
`err.thresh`	threshold level for reads with error at that position. Loci where the count exceeds threshold will be discarded.
`del.thresh`	threshold level for reads with deletions at that position. Loci where the count exceeds threshold will be discarded.
`perl.pileup`	logical indicating whether the data file was created using the earlier perl version (package in Google-site).

The SNPs used for generating the data file are the set of polymorphic loci with single nucleotide change. In order to cover regions that are sparse in polymorphic loci a set of non-polymorphic loci (pseudo-SNPs) are used.

For copy number analysis the DNA fragment is the independent unit of analysis. This loci with overlapping paired end reads should not be counted twice (older versions of samtools mpileup will do this).

This function is written for the counter written by Venkat Seshan (in perl) and re-implemented in C++ by Alex Studer. The file format for the c++ version is different from the perl version. Alternate counters can be accommodated by writing a similar function.

This function expects the read counts to be in normal-tumor order. So use snp-pileup with the bam files given in normal-tumor order.

For WGS data this function will be slow and memory intensive. As an alternate you can use the function readSnpMatrixDT [written by Dario Beraldi] which uses the data.table package available from the extRfns directory. It can be accessed using

source(system.file("extRfns", "readSnpMatrixDT.R", package="facets"))

A data frame consisting of 6 variables for each SNP (or pseudo-SNP).

`Chrom`	chromosome that SNP is on
`Pos`	genomic position. This value depends on the genome build.
`NOR.DP`	number of reads covering the snp in the normal sample.
`NOR.RD`	number of reads with ref allele in the normal sample.
`TUM.DP`	number of reads covering the snp in the tumor sample.
`TUM.RD`	number of reads with ref allele in the tumor sample.