View source: R/rearrangement-utils.R
seqJunctionsInferredByPairedTags2 | R Documentation |
Identifies genomic intervals containing clusters of improper reads and constructs a Rearrangement object containing the unlinked clusters and the improper read pairs.
seqJunctionsInferredByPairedTags2(preprocess, param)
preprocess |
a list as created by |
param |
A |
Read pairs are selected with parameters set in the
RearrangementParams
object (denoted params
) as follows:
1. the distance between the first and last read for a given pair
must be at least rp_separation(params)
2. Duplicate read pairs are dropped
3. For reads passing (1) and (2), the total number of reads aligned
to each bin in bins
is counted. We exclude bins for which
the number of aligned reads is less than
minNumberTagsPerCluster(params)
. We use the remaining bins
to subset the read pairs object – keeping only read pairs for
which the first or the last read overlaps a bin. Steps 1-3 are
performed by the function filterPairedReads
.
4. Clusters of improper reads that pass (1), (2), and (3) are
identified. In particular, we apply the function
unpairThenReduce
that first uncouples the read pairs and
creates a single GRanges
object. The GRanges
object is then reduced with argument min.gapwidth
set to
minGapWidth(params)
(default is 1kb). The interval for
each cluster is given by the reduced ranges. We keep only those
intervals that have a width of at least
minClusterSize(params)
(default 115bp) and no larger than
maxClusterSize(params)
(default 5000bp). In addition, we
keep only those intervals for which the number of reads belonging
to the interval is at least
minNumberTagsPerCluster(params)
(default 5). Step 4 is
performed by the function clusterTags
.
5. Given a set of unlinked clusters and improper read pairs (filtered by steps 1-4), the Rearrangement constructor does the following:
i. annotates the red pairs with a unique id for cluster membership
ii. links the clusters (called linkedBins) (linkClustersByReadPairs
)
iii. partitions the improper read pairs according to whether they link two clusters (a read pair can belong to multiple paired clusters)
iv. maps each tag to a cluster
REFACTORING: Rearrangement should do nothing and should be able to construct an empty Rearrangement object if no data is provided. Move the functions that do step 5 out of the constructor.
data(pdata, package="trellis")
rp <- RearrangementParams()
##
## The file of improper read pairs is large, so this is slow
##
r <- seqJunctionsInferredByPairedTags2(pdata,
param=rp)
r
## improper read pairs that link the clusters
head(improper(r))
## The linked tag cluster intervals
linkedBins(r)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.