Description Usage Arguments Details Value Author(s) See Also Examples
View source: R/viRome_functions.R View source: R/clip.bam.R
The function investigates the CIGAR string and performs soft-clipping of the sequences within the "seq" column, storing the results (and other various statistics) within additional columns.
1 |
vdf |
A data frame as read from |
Aligned sequences are stored in BAM files with the caveat that they may have been soft-clipped - i.e. the clipped sequence was not used in the alignment, but is included in the BAM file. This function looks at the CIGAR string and clips data in the "seq" column appropriately.
For example, conside the CIGAR string "12S25M" and the sequence read "CACCCGAGAATACCCCAGAACCATTATGCTGTGACTT". The sequence read is 37bp long; however, the cigar string tells us that only 25 "matched" (25M) i.e. only 25 were used in the alignment. The CIGAR string also tells us that 12bp were soft-clipped (12S). We can tell that they were soft-clipped from the start of the read as 12S occurs before 25M.
This function first removes from the CIGAR string any notation of "hard-clipping" - we do not need this information. The function then finds all reads that are marked as "soft-clipped", and examines whether they are marked as clipped from the beginning or end of the read. The total amount of clipped sequence, the amount clipped from the left- and right- sides, the clipped sequence and the length of the clipped sequence are all then stored in additional columns.
The same data.frame you put it with the additional fields:
softclip |
Total number of bases soft clipped from the read |
leftclip |
Number of bases clipped from the left of the read |
rightclip |
Number of bases clipped from the right of the read |
clipseq |
The sequence left after clipping |
cliplen |
The length of the clipped sequence |
Mick Watson
1 2 3 4 5 | ## Not run: infile <- system.file("data/SRR389184_vs_SINV_sorted.bam", package="viRome")
## Not run: bam <- read.bam(bamfile=infile, chr="SINV", start=1, end=11703, removeN=TRUE)
## Not run: bam[1:10,]
## Not run: bamc <- clip.bam(bam)
## Not run: bamc[1:10,]
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.