getStrandFromBamFile: Get the strand information of all windows from bam files

Description Usage Arguments Details Value See Also Examples

View source: R/getStrandFromBamFile.R

Description

Get the number of positive/negative reads/coverage of all slding windows from the bam input files

Usage

1
2
3
getStrandFromBamFile(files, sequences, mapqFilter = 0,
  yieldSize = 1e+06, winWidth = 1000L, winStep = 100L,
  readProp = 0.5, paired)

Arguments

files

the input bam files. Your bamfiles should be sorted and have their index files located at the same path.

sequences

character vector used to restrict analysed alignments to a subset of chromosomes (i.e. sequences) within the provided bam file. These correspond to chromosomes/scaffolds of the reference genome to which the reads were mapped. If absent, the whole bam file will be read. NB: This must match the chromosomes as defined in your reference genome. If the reference chromosomes were specified using the 'chr' prefix, ensure the supplied vector matches this specification.

mapqFilter

every read that has mapping quality below mapqFilter will be removed before any analysis.

yieldSize

by default is 1e6, i.e. the bam file is read by block of reads whose size is defined by this parameter. It is used to pass to same parameter of the scanBam function.

winWidth

the width of the sliding window, 1000 by default.

winStep

the step length to sliding the window, 100 by default.

readProp

A read is considered to be included in a window if at least readProp of it is in the window. Specified as a proportion. 0.5 by default.

paired

if TRUE then the input bamfile will be considered as paired-end reads. If missing, 100 thousands first reads will be inspected to test if the input bam file in paired-end or single-end.

Details

This function moves along the specified chromosomes (i.e. sequences) using a sliding window approach, and counts the number of reads in each window which align to the +/- strands of the reference genome. As well as the number of reads, the total coverage for each strand is also returned for each window, representing the total number of bases covered.

Average coverage for the entire window can be simply calculated by dividing the total coverage by the window size.

Value

a DataFrame object containing the number of positive/negative reads and coverage of each window sliding across the bam file. The returned DataFrame has 10 columns:

Type: can be either SE if the input file contains single-end reads, or R1/R2 if the input file contains paired-end reads.

Seq: the reference sequence (chromosome/scaffold) that the reads were mapped to.

Start: the start position of the sliding window.

End: the end position of the sliding window.

NbPos/NbNeg: number of positive/negative reads that overlap the sliding window.

CovPos/CovNeg: number of bases coming from positive/negative reads that were mapped in the sliding window.

MaxCoverage: the maximum coverage within the sliding window.

File: the name of the input file.

See Also

filterDNA, plotHist, plotWin

Examples

1
2
3
file <- system.file('extdata','s1.sorted.bam',package = 'strandCheckR')
win <- getStrandFromBamFile(file,sequences='10')
win

UofABioinformaticsHub/strandCheckR documentation built on Aug. 15, 2021, 9:08 a.m.