binReadCounts: Calculate binned read counts from a set of BAM files

View source: R/binReadCounts.R

binReadCountsR Documentation

Calculate binned read counts from a set of BAM files

Description

Calculate binned read counts from a set of BAM files.

Usage

binReadCounts(bins, bamfiles=NULL, path=NULL, ext="bam", bamnames=NULL, phenofile=NULL,
  chunkSize=NULL, cache=getOption("QDNAseq::cache", FALSE), force=!cache, isPaired=NA,
  isProperPair=NA, isUnmappedQuery=FALSE, hasUnmappedMate=NA, isMinusStrand=NA,
  isMateMinusStrand=NA, isFirstMateRead=NA, isSecondMateRead=NA, isSecondaryAlignment=NA,
  isNotPassingQualityControls=FALSE, isDuplicate=FALSE, minMapq=37, pairedEnds=NULL,
  verbose=getOption("QDNAseq::verbose", TRUE))

Arguments

bins

A data.frame or an AnnotatedDataFrame object containing bin annotations.

bamfiles

A character vector of (BAM) file names. If NULL (default), all files with extension ext, are read from directory path.

path

If bamfiles is NULL, directory path to read input files from. Defaults to the current working directory.

ext

File name extension of input files to read, default is "bam".

bamnames

An optional character vector of sample names. Defaults to file names with extension ext removed.

phenofile

An optional character(1) specifying a file name for phenotype data.

chunkSize

An optional integer specifying the chunk size (nt) by which to process the bam file.

cache

Whether to read and write intermediate cache files, which speeds up subsequent analyses of the same files. Requires packages R.cache and digest (both available on CRAN) to be installed. Defaults to getOption("QDNAseq::cache", FALSE).

force

When using the cache, whether to force reading input data from the BAM files even when an intermediate cache file is present.

isPaired

A logical(1) indicating whether unpaired (FALSE), paired (TRUE), or any (NA, default) read should be returned.

isProperPair

A logical(1) indicating whether improperly paired (FALSE), properly paired (TRUE), or any (NA, default) read should be returned. A properly paired read is defined by the alignment algorithm and might, e.g., represent reads aligning to identical reference sequences and with a specified distance.

isUnmappedQuery

A logical(1) indicating whether unmapped (TRUE), mapped (FALSE, default), or any (NA) read should be returned.

hasUnmappedMate

A logical(1) indicating whether reads with mapped (FALSE), unmapped (TRUE), or any (NA, default) mate should be returned.

isMinusStrand

A logical(1) indicating whether reads aligned to the plus (FALSE), minus (TRUE), or any (NA, default) strand should be returned.

isMateMinusStrand

A logical(1) indicating whether mate reads aligned to the plus (FALSE), minus (TRUE), or any (NA, default) strand should be returned.

isFirstMateRead

A logical(1) indicating whether the first mate read should be returned (TRUE) or not (FALSE), or whether mate read number should be ignored (NA, default).

isSecondMateRead

A logical(1) indicating whether the second mate read should be returned (TRUE) or not (FALSE), or whether mate read number should be ignored (NA, default).

isSecondaryAlignment

A logical(1) indicating whether alignments that are primary (FALSE), are not primary (TRUE) or whose primary status does not matter (NA, default) should be returned. A non-primary alignment ("secondary alignment" in the SAM specification) might result when a read aligns to multiple locations. One alignment is designated as primary and has this flag set to FALSE; the remainder, for which this flag is TRUE, are designated by the aligner as secondary.

isNotPassingQualityControls

A logical(1) indicating whether reads passing quality controls (FALSE, default), reads not passing quality controls (TRUE), or any (NA) read should be returned.

isDuplicate

A logical(1) indicating that un-duplicated (FALSE, default), duplicated (TRUE), or any (NA) reads should be returned. 'Duplicated' reads may represent PCR or optical duplicates.

minMapq

If quality scores exists, the minimum quality score required in order to keep a read, otherwise all reads are kept.

pairedEnds

A boolean value or vector specifying whether the BAM files contain paired-end data or not. Only affects the calculation of the expected variance.

verbose

If TRUE, verbose messages are produced.

Value

Returns a QDNAseqReadCounts object with assay data element counts containing the binned read counts as non-negative integers.

Author(s)

Ilari Scheinin, Daoud Sie

Examples

## Not run: # read all files from the current directory with names ending in .bam
bins <- getBinAnnotations(15)
readCounts <- binReadCounts(bins)

## End(Not run)

ccagc/QDNAseq documentation built on Feb. 2, 2023, 12:56 p.m.