clip.bam: Deals with soft-clipping encoded in the CIGAR string and...

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/viRome_functions.R View source: R/clip.bam.R

Description

The function investigates the CIGAR string and performs soft-clipping of the sequences within the "seq" column, storing the results (and other various statistics) within additional columns.

Usage

1
clip.bam(vdf = NULL)

Arguments

vdf

A data frame as read from read.bam

Details

Aligned sequences are stored in BAM files with the caveat that they may have been soft-clipped - i.e. the clipped sequence was not used in the alignment, but is included in the BAM file. This function looks at the CIGAR string and clips data in the "seq" column appropriately.

For example, conside the CIGAR string "12S25M" and the sequence read "CACCCGAGAATACCCCAGAACCATTATGCTGTGACTT". The sequence read is 37bp long; however, the cigar string tells us that only 25 "matched" (25M) i.e. only 25 were used in the alignment. The CIGAR string also tells us that 12bp were soft-clipped (12S). We can tell that they were soft-clipped from the start of the read as 12S occurs before 25M.

This function first removes from the CIGAR string any notation of "hard-clipping" - we do not need this information. The function then finds all reads that are marked as "soft-clipped", and examines whether they are marked as clipped from the beginning or end of the read. The total amount of clipped sequence, the amount clipped from the left- and right- sides, the clipped sequence and the length of the clipped sequence are all then stored in additional columns.

Value

The same data.frame you put it with the additional fields:

softclip

Total number of bases soft clipped from the read

leftclip

Number of bases clipped from the left of the read

rightclip

Number of bases clipped from the right of the read

clipseq

The sequence left after clipping

cliplen

The length of the clipped sequence

Author(s)

Mick Watson

See Also

read.bam

Examples

1
2
3
4
5
 ## Not run: infile <- system.file("data/SRR389184_vs_SINV_sorted.bam", package="viRome")
 ## Not run: bam <- read.bam(bamfile=infile, chr="SINV", start=1, end=11703, removeN=TRUE)
 ## Not run: bam[1:10,]
 ## Not run: bamc <- clip.bam(bam)
 ## Not run: bamc[1:10,]

mw55309/viRome_legacy documentation built on Dec. 21, 2021, 11:05 p.m.