getAlignGal: Import and processs in BAM/SAM/BED format

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/getAlignGal.R

Description

Import and process single-end or paired-end alignments in a BAM/SAM/BED file to retain valid alignments defined by the arguments below. Multihits (same read mapped to multiple loci) are flagged for the subsequent disambiguation with function disambiguateMultihits). The final output is a GAlignments object.

Usage

1
2
3
4
getAlignGal(alignFilePath, format, genomeBuild, 
	deleteGeneratedBAM = FALSE, reverseComplement = FALSE, 
	returnDuplicate = FALSE, flagMultiHits = TRUE, 
	returnOnlyUniqueHits = FALSE, paired = FALSE, ...)

Arguments

alignFilePath

Path to the alignment file.

format

The alignmnet format can be determiend automatically from the file extension or specified by the user. The supported formats are BAM, SAM, and BED.

genomeBuild

Genome build used to obtain the chromosome information from online UCSC database in order to construct GAlignments object. Since the BAM/SAM header provides the chromosome information, the argument needs to be set only in the absence of the header information for some BAM/SAM files or when BED file is used. Examples for the common genomeBuild are "mm9" for mouse or "hg19" for human reference genomes. Note that an appropriate genome build that has been used in the alignment is important for desirable outcome. For instance, user should use "mm10" if the alignments are based on "mm10" rather than "mm9" genome build.

deleteGeneratedBAM

Binary indicator to indicate whether the converted BAM from the original SAM input file needs to be deleted from the local disk (Default: FALSE).

reverseComplement

Binary indicator to indicate whether the reads were sequenced from the opposite strand of the original RNA molecule. reverseComplement only applies to strand-specific sequencing in which case only the strand generated during second strand synthesis is sequenced. Thus, if reverseComplement=TRUE, the strand signs of the alignments are switched (i.e. + to -, - to +, and * unchanged); otherwise (reverseComplement=FALSE) retian the original the strand signs.

returnDuplicate

Indicator (TRUE, FALSE, NA) to instruct whether the duplicate alignmnets need to be returned (Default: FALSE). Duplicate reads are a set of reads that align to exactly the same genomic coordinate. Because transcripts are usually hundreds or thousands of base pairs long and thus much longer than the read (25-100 nt), the chance that the same 25-100 nt portion of the transcript being sequenced multiple times is very small and may very likely be due to PCR artifact. This argument is acutally passed to 'isDuplicate' in scanBamFlag.

flagMultiHits

Binary indicator for whether to add additional binary column named "uniqueHits" to indicate whether the corresponding aligned reads are unique hit (uniqueHits==TRUE) or multihit (uniqueHits==FALSE). Multihits represent multiple alignments of the same read due to gene duplications or repetitive elements of the genome. The multhits typically constitute a substantial proportion of the total mapped reads. Rather than being removed, these multihits are flagged (flagMultiHits=TRUE by default) and in the later step assigned to a unique region by (disambiguateMultihits).

returnOnlyUniqueHits

Binary indicator to return only the unique hits and discard all of the multihits (Default: FALSE).

paired

Binary indicator to indicate whether the alignments are paired-end (Default: FALSE). For paired-end alignments, properly paired reads are combined into a single alignment record making use of the CIGAR flag ā€˜Nā€™ to indicate the number of bases between the mate pairs (i.e., the length of the insert fragment). In other words, the paired-end alignments are treated as gapped alignments of long fragments (See galp2gal).

...

Extra arguments are ignored.

Details

The BAM file is imported using readGAlignments for single-end or readGAlignmentPairs for paired-end alignments. The SAM file is converted to BAM first and then imported as above. The BED file is first imported by import as GRanges object and subsequently converted to GAlignments via the constructor function GAlignments.

Value

alignGal

GAlignments object containning the processed alignments with the values slot saved for the "uniqueHits" binary flag (See flagMultiHits above) and metadata saved as a list containing argument setting for reverseComplement, returnDuplicate, flagMultiHits, returnOnlyUniqueHits

Author(s)

Yue Li

References

P. Aboyoun, H. Pages and M. Lawrence. GenomicRanges: Representation and manipulation of genomic intervals. R package version 1.8.9.

Michael Lawrence, Vince Carey and Robert Gentleman. rtracklayer: R interface to genome browsers and their annotation tracks. R package version 1.16.3.

See Also

combineAlignGals, readGAlignments, readGAlignmentPairs, import

Examples

1
2
3
4
5
6
7
8
# Retrieve system files
extdata.dir <- system.file("extdata", package="RIPSeeker") 

bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)

bamFiles <- grep("PRC2", bamFiles, value=TRUE)

alignGal <- getAlignGal(bamFiles[1], reverseComplement=TRUE, genomeBuild="mm9")

RIPSeeker documentation built on Oct. 31, 2019, 7:29 a.m.