readSnpMatrix: Read in the SNP read count matrix file

Description Usage Arguments Details Value

View source: R/facets-wrapper.R

Description

Reads a snp read count matrix generated by the snp-pileup code included and prepares the read counts data frame needed for preProcSample

Usage

1
2
readSnpMatrix(filename, skip=0L, err.thresh=Inf, del.thresh=Inf,
                     perl.pileup=FALSE)

Arguments

filename

absolute or relative path of the data file.

skip

number of lines to skip. Defaults is none.

err.thresh

threshold level for reads with error at that position. Loci where the count exceeds threshold will be discarded.

del.thresh

threshold level for reads with deletions at that position. Loci where the count exceeds threshold will be discarded.

perl.pileup

logical indicating whether the data file was created using the earlier perl version (package in Google-site).

Details

The SNPs used for generating the data file are the set of polymorphic loci with single nucleotide change. In order to cover regions that are sparse in polymorphic loci a set of non-polymorphic loci (pseudo-SNPs) are used.

For copy number analysis the DNA fragment is the independent unit of analysis. This loci with overlapping paired end reads should not be counted twice (older versions of samtools mpileup will do this).

This function is written for the counter written by Venkat Seshan (in perl) and re-implemented in C++ by Alex Studer. The file format for the c++ version is different from the perl version. Alternate counters can be accommodated by writing a similar function.

This function expects the read counts to be in normal-tumor order. So use snp-pileup with the bam files given in normal-tumor order.

For WGS data this function will be slow and memory intensive. As an alternate you can use the function readSnpMatrixDT [written by Dario Beraldi] which uses the data.table package available from the extRfns directory. It can be accessed using

source(system.file("extRfns", "readSnpMatrixDT.R", package="facets"))

Value

A data frame consisting of 6 variables for each SNP (or pseudo-SNP).

Chrom

chromosome that SNP is on

Pos

genomic position. This value depends on the genome build.

NOR.DP

number of reads covering the snp in the normal sample.

NOR.RD

number of reads with ref allele in the normal sample.

TUM.DP

number of reads covering the snp in the tumor sample.

TUM.RD

number of reads with ref allele in the tumor sample.


mskcc/facets documentation built on Oct. 15, 2021, 3:12 p.m.