pipe.BAMpileup: Extract Pileup Details from BAM file.

pipe.BAMpileupR Documentation

Extract Pileup Details from BAM file.

Description

High level and low level wrappper functions to invoke SAMTOOLS MPILEUP to get alignment details for a region in a BAM file.

Usage

pipe.BAMpileup(sampleID, geneID = NULL, seqID = NULL, start = NULL, stop = NULL, 
	annotationFile = "Annotation.txt", optionsFile = "Options.txt", 
	results.path = NULL, summarize.calls = TRUE, verbose = FALSE)

BAM.mpileup(files, seqID, fastaFile, start = NULL, stop = NULL, 
	min.depth = 3, max.depth = 10000, min.gap.fraction = 0.25, 
	mpileupArgs = "", summarize.calls = FALSE, verbose = TRUE)

Arguments

sampleID

the SampleID for this sample. This SampleID keys for a row of annotation details in the annotation file, for getting sample-specific details. The SampleID is also used as a sample-specific prefix for all files created during the processing of this sample.

geneID

Optional character string of one GeneID.

seqID

Optional character string of one SeqID. Required explicitily for the low level function.

start
stop

Optional integer for the starting and stopping nucleotide location. Note that any valid combination of the above that is sufficient to define a region of a chromosome is allowed. For example, a GeneID alone also implies its SeqID, start, and stop.

annotationFile

the file of sample annotation details, which specifies all needed sample-specific information about the samples under study. See DuffyNGS_Annotation.

optionsFile

the file of processing options, which specifies all processing parameters that are not sample specific. See DuffyNGS_Options.

results.path

The top level folder where all results have been written to. Default is taken from the options file.

summarize.calls

Logical. Controls whether the very low level details of the base calls at each nucleotide location are summarized into tabular form. FALSE leaves the raw large cryptic strings as made by SAMTOOLS MPILEUP. When TRUE, they are summarized into a consensus base call and a short text string count summary of A/C/G/T/Indel calls.

fastaFile

Full pathname to the genomic DNA FASTA file that made the Bowtie index that the BAM file was aligned against.

min.depth
max.depth
min.gap.fraction

Bounds passed to SAMTOOLS MPILEUP, to control how it performs and reports its assessment of the BAM file.

mpileupArgs

Other optional arguments passed to SAMTOOLS MPILEUP.

Details

These functions give two different levels of control for extracting aligned read pileup information from BAM files, via the SAMTOOLS MPILEUP utility. The high level function gives an easy way to extract details for single genes or regions, while the low level function gives finer control.

Value

A data frame of details about read coverage and base depth in the specified region of a chromosome. Only base locations with 1+ read of coverage are returned, so there can be gaps with no information. Columns include:

SEQ_ID

chromosome name as a character string

POSITION

chromosome location as an integer

REF_BASE

the expected base call from the reference genome

DEPTH

the depth of coverage at this position, as an integer

CALL_BASE

when summarize is TRUE, the one consensus base that was most frequently observed. If FALSE, the the full cryptic string of base calls as made by MPILEUP

CALL_SCORE

only when summarize is FALSE, the full cryptic string of Phred quality scores

BASE_TABLE

only when summarize is TRUE, the short sorted table of observed base counts

Note

Note that the high level function only operates on a single sample. While the low level function can be given a vector of BAM files, and then returns depth and call details for multiple files at one time.

Author(s)

Bob Morrison

References

SAMTOOLS www.htslib.org/doc/samtools.html

See Also

For various low level functions that manipulate the MPILEUP calls, see MPU.callBases.


robertdouglasmorrison/DuffyNGS documentation built on March 24, 2024, 4:16 p.m.