seqJunctionsInferredByPairedTags2: Constructs a Rearrangement object of unlinked tag-clusters

View source: R/rearrangement-utils.R

seqJunctionsInferredByPairedTags2R Documentation

Constructs a Rearrangement object of unlinked tag-clusters

Description

Identifies genomic intervals containing clusters of improper reads and constructs a Rearrangement object containing the unlinked clusters and the improper read pairs.

Usage

seqJunctionsInferredByPairedTags2(preprocess, param)

Arguments

preprocess

a list as created by preprocessData

param

A RearrangementParams object

Details

Read pairs are selected with parameters set in the RearrangementParams object (denoted params) as follows:

1. the distance between the first and last read for a given pair must be at least rp_separation(params)

2. Duplicate read pairs are dropped

3. For reads passing (1) and (2), the total number of reads aligned to each bin in bins is counted. We exclude bins for which the number of aligned reads is less than minNumberTagsPerCluster(params). We use the remaining bins to subset the read pairs object – keeping only read pairs for which the first or the last read overlaps a bin. Steps 1-3 are performed by the function filterPairedReads.

4. Clusters of improper reads that pass (1), (2), and (3) are identified. In particular, we apply the function unpairThenReduce that first uncouples the read pairs and creates a single GRanges object. The GRanges object is then reduced with argument min.gapwidth set to minGapWidth(params) (default is 1kb). The interval for each cluster is given by the reduced ranges. We keep only those intervals that have a width of at least minClusterSize(params) (default 115bp) and no larger than maxClusterSize(params) (default 5000bp). In addition, we keep only those intervals for which the number of reads belonging to the interval is at least minNumberTagsPerCluster(params) (default 5). Step 4 is performed by the function clusterTags.

5. Given a set of unlinked clusters and improper read pairs (filtered by steps 1-4), the Rearrangement constructor does the following:

i. annotates the red pairs with a unique id for cluster membership

ii. links the clusters (called linkedBins) (linkClustersByReadPairs)

iii. partitions the improper read pairs according to whether they link two clusters (a read pair can belong to multiple paired clusters)

iv. maps each tag to a cluster

REFACTORING: Rearrangement should do nothing and should be able to construct an empty Rearrangement object if no data is provided. Move the functions that do step 5 out of the constructor.

Examples

  data(pdata, package="trellis")
  rp <- RearrangementParams()
  ##
  ## The file of improper read pairs is large, so this is slow
  ##
  r <- seqJunctionsInferredByPairedTags2(pdata,
                                         param=rp)
  r
  ## improper read pairs that link the clusters
  head(improper(r))
  ## The linked tag cluster intervals
  linkedBins(r)

cancer-genomics/trellis documentation built on Feb. 2, 2023, 7:04 p.m.