Description Usage Arguments Details Value Author(s) See Also Examples
View source: R/pileLettersAt.R
pileLettersAt
extracts the letters/nucleotides of a set of
reads that align to a set of genomic positions of interest.
The extracted letters are returned as "piles of letters" (one per
genomic position of interest) stored in an XStringSet
(typically DNAStringSet) object.
1 | pileLettersAt(x, seqnames, pos, cigar, at)
|
x |
An XStringSet (typically DNAStringSet) object containing N unaligned read sequences (a.k.a. the query sequences) reported with respect to the + strand. |
seqnames |
A factor-Rle parallel to |
pos |
An integer vector parallel to |
cigar |
A character vector parallel to |
at |
A GPos object containing the genomic positions
of interest. If |
x
, seqnames
, pos
, cigar
must be 4 parallel
vectors describing N aligned reads.
An XStringSet (typically DNAStringSet)
object parallel to at
(i.e. with 1 string per genomic
position).
Hervé Pagès
The pileup
and applyPileups
functions defined in the
Rsamtools package, as well as the SAMtools mpileup command
(available at http://samtools.sourceforge.net/ as part of the
SAMtools project), for more powerful flexible alternatives.
The stackStringsFromGAlignments
function
for stacking the read sequences (or their quality strings)
stored in a GAlignments object or a BAM file.
DNAStringSet objects in the Biostrings package.
GPos objects in the GenomicRanges package.
GAlignments objects.
cigar-utils for the CIGAR utility functions used internally
by pileLettersAt
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | ## Input
## - A BAM file:
bamfile <- BamFile(system.file("extdata", "ex1.bam", package="Rsamtools"))
seqinfo(bamfile) # to see the seqlevels and seqlengths
stackStringsFromBam(bamfile, param="seq1:1-21") # a quick look at
# the reads
## - A GPos object containing Genomic Positions Of Interest:
my_GPOI <- GPos(c("seq1:1-5", "seq1:21-21", "seq1:1575-1575",
"seq2:1513-1514"))
## Some preliminary massage on 'my_GPOI'
seqinfo(my_GPOI) <- merge(seqinfo(my_GPOI), seqinfo(bamfile))
seqlevels(my_GPOI) <- seqlevelsInUse(my_GPOI)
## Load the BAM file in a GAlignments object. Note that we load only
## the reads aligned to the sequences in 'seqlevels(my_GPOI)'. Also,
## in order to be consistent with applyPileups() and SAMtools (m)pileup,
## we filter out the following BAM records:
## - secondary alignments (flag bit 0x100);
## - reads not passing quality controls (flag bit 0x200);
## - PCR or optical duplicates (flag bit 0x400).
## See ?ScanBamParam and the SAM Spec for more information.
which <- as(seqinfo(my_GPOI), "GRanges")
flag <- scanBamFlag(isSecondaryAlignment=FALSE,
isNotPassingQualityControls=FALSE,
isDuplicate=FALSE)
what <- c("seq", "qual")
param <- ScanBamParam(flag=flag, what=c("seq", "qual"), which=which)
gal <- readGAlignments(bamfile, param=param)
seqlevels(gal) <- seqlevels(my_GPOI)
## Extract the read sequences (a.k.a. query sequences) and quality
## strings. Both are reported with respect to the + strand.
qseq <- mcols(gal)$seq
qual <- mcols(gal)$qual
nucl_piles <- pileLettersAt(qseq, seqnames(gal), start(gal), cigar(gal),
my_GPOI)
qual_piles <- pileLettersAt(qual, seqnames(gal), start(gal), cigar(gal),
my_GPOI)
mcols(my_GPOI)$nucl_piles <- nucl_piles
mcols(my_GPOI)$qual_piles <- qual_piles
my_GPOI
## Finally, to summarize A/C/G/T frequencies at each position:
alphabetFrequency(nucl_piles, baseOnly=TRUE)
## Note that the pileup() function defined in the Rsamtools package
## can be used to obtain a similar result:
scanbam_param <- ScanBamParam(flag=flag, which=my_GPOI)
pileup_param <- PileupParam(max_depth=5000,
min_base_quality=0,
distinguish_strands=FALSE)
pileup(bamfile, scanBamParam=scanbam_param, pileupParam=pileup_param)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.