readParam | R Documentation |
Class to specify read loading parameters
Each readParam object stores a number of parameters, each pertaining to the extraction of reads from a BAM file. Slots are defined as:
pe
:A string indicating whether paired-end data is present; set to "none"
, "both"
, "first"
or "second"
.
max.frag
:An integer scalar, specifying the maximum fragment length corresponding to a read pair.
dedup
:A logical scalar indicating whether marked duplicate reads should be ignored.
minq
:An integer scalar, specifying the minimum mapping quality score for an aligned read.
forward
:A logical scalar indicating whether only forward reads should be extracted.
restrict
:A character vector containing the names of allowable chromosomes from which reads will be extracted.
discard
:A GRanges object containing intervals in which any alignments will be discarded.
Marked duplicate reads will be removed with dedup=TRUE
.
This may be necessary when many rounds of PCR have been performed during library preparation.
However, it is not recommended for routine counting as it will interfere with the downstream statistical methods.
Note that the duplicate field must be set beforehand in the BAM file for this argument to have any effect.
Reads can also be filtered by their mapping quality scores if minq
is specified at a non-NA
value.
This is generally recommended to remove low-confidence alignments.
The exact threshold for minq
will depend on the range of scores provided by the aligner.
If minq=NA
, no filtering on the score will be performed.
If restrict
is supplied, reads will only be extracted for the specified chromosomes.
This is useful to restrict the analysis to interesting chromosomes, e.g., no contigs/scaffolds or mitochondria.
Conversely, if discard
is set, a read will be removed if the corresponding alignment is wholly contained within the supplied ranges.
This is useful for removing reads in repeat regions.
Note that secondary or supplementary alignments are ignored in all functions. The former usually refer to alternative mapping locations for the same read, while the latter refer to chimeric reads. Neither are of much use in a typical ChIP-seq analysis and will be discarded if they are present in the BAM file.
For pe="both"
, reads are extracted with the previously described filters, i.e., discard
, minq
, dedup
.
Extracted reads are then grouped into proper pairs.
Proper pairs are those where the two reads are close together (on the same chromosome, obviously) and in an inward-facing orientation.
The fragment interval is defined as that bounded by the 5' ends of the two reads in a proper pair.
The fragment size is defined as the length of the interval bounded by the 5' ends of two inward-facing reads.
Those pairs with fragment sizes above max.frag
are removed, as they are more likely to be the result of mapping errors than genuinely large fragment sizes.
Users should run getPESizes
to pick an appropriate value for their data sets, though thresholds of around 500-1000 bp are usually fine.
Paired-end data can also be treated as single-end data by only using one read from each pair with pe="first"
or "second"
.
This is useful for poor-quality data where the paired-end procedure has obviously failed, e.g., with many interchromosomal read pairs or pairs with large fragment lengths.
Treating the data as single-end may allow the analysis to be salvaged.
In all cases, users should ensure that each BAM file containing paired-end data is properly synchronized prior to count loading. This can be done using standard tools like FixMateInformation from the Picard suite (http://broadinstitute.github.io/picard).
If pe="none"
, reads are assumed to be single-end.
Read extraction from BAM files is performed with the same quality filters described above.
If forward
is NA
, reads are extracted from all strands.
Otherwise, reads are only extracted from the forward or reverse strands for TRUE
or FALSE
, respectively.
This may be useful for applications requiring strand-specific counting.
A special case is forward=NULL
- see strandedCounts
for more details.
Any soft clipping in alignments are ignored during extraction (this is also true for paired-end data). Soft clips are presumed to be sequencing artifacts, e.g., when the adaptor or barcodes are not properly removed from the read sequence. They are not relevant to computing genomic coverage. Thus, in this package, any references to the length or 5'/3' ends of the read will actually refer to that of the alignment. This is often more appropriate, e.g., the 5' end of the alignment represents the end of the fragment after clipping of the artifacts.
readParam(pe="none", max.frag=500, dedup=FALSE, minq=NA, forward=NA, restrict=NULL, discard=GRanges())
:
Creates a readParam object.
Each argument is placed in the corresponding slot, with coercion into the appropriate type.
In the code snippets below, x
is a readParam object.
x$name
:Returns the value in slot name
.
In the code snippets below, x
is a readParam object.
show(x)
:Describes the parameter settings in plain English.
reform(x, ...)
:Creates a new readParam object, based on the existing x
.
Any named arguments in ...
are used to modify the values of the slots in the new object, with type coercion as necessary.
Aaron Lun
windowCounts
,
regionCounts
,
correlateReads
,
getPESizes
,
BiocParallelParam
blah <- readParam()
blah <- readParam(discard=GRanges("chrA", IRanges(1, 10)))
blah <- readParam(restrict='chr2')
blah$pe
blah$dedup
# Use 'reform' if only some arguments need to be changed.
blah
reform(blah, dedup=TRUE)
reform(blah, pe="both", max.frag=212.0)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.