applyTallies: Preparing the results of tallyBAM for writing to an HDF5...

Description Usage Arguments Details Value Author(s) Examples

View source: R/applyTallies.R

Description

This function tallies a set of bam files and prepares the data for writing to an HDF5 tally file.

Usage

1
applyTallies( bamfiles, chrom, start, stop, q=25, ncycles = 0, max.depth=1000000, prepForHDF5 = TRUE, reference = NULL)

Arguments

bamfiles

A character vector of filenames of the bam files that should be tallies. Note that for writing to an HDF5 file the order of this vector must match the order of the Column field in the sampledata object that corresponds to the dataset - see setSampleData for details.

prepForHDF5

Boolean flag to specify whether the data shall be structured for compatibility with the HDF5 tally file format. See the details section of this manual page.

reference

A DNAString object containing the reference sequence corresponding to the region that is described in the counts array – if this is NULL a consensus vote will be used to estimate the reference at any given position, this means you cannot detect variants with AF >= 0.5 anymore

chrom

Chromosome in which to tally

start

First position of the tally

stop

Last position of the tally

q

quality cut-off for considering a base call

ncycles

number of sequencing cycles form the front and back of the read that should be considered unreliable - used for stratifying the nucleotide counts

max.depth

only tally a position if there are less than this many reads overlapping it - can prevent long runtimes in unreliable regions

Details

This is a wrapper function for applying tallyBAM to a set of bam files specified in the bamfiles argument. If prepForHDF5 is not true the result is equivalent to calling tallyBAM with lapply on the file names, otherwise the resulting data structure has the same layout as the return value of h5readBlock and can be written to an HDF5 tally file directly. The order or samples along the sample dimension is the same as the order of the file names (i.e. the order of the bamfiles argument).

Value

A list with slots containing the Counts,Coverages,Deletions and Reference datasets for the given sample if prepForHDF5 is true, a list of 3D-arrays (Nucleotide x Strand x Position) otherwise.

Author(s)

Paul Pyl

Examples

1
2
3
4
5
6
7
8
9
library(h5vc)
library(BSgenome.Hsapiens.UCSC.hg19)
files <- c("NRAS.AML.bam","NRAS.Control.bam")
bamFiles <- file.path( system.file("extdata", package = "h5vcData"), files)
chrom = "1"
startpos <- 115247090
endpos <- 115259515
theData <- applyTallies( bamFiles, reference = Hsapiens[["chr1"]][startpos:endpos], chr = chrom, start = startpos, stop = endpos, ncycles = 10 )
str(theData)

h5vc documentation built on Nov. 8, 2020, 4:56 p.m.