findCandidates2: Finds candidate somatic rearrangements
In cancer-genomics/trellis: Somatic structural variant analysis

findCandidates2

R Documentation

Finds candidate somatic rearrangements

Description

This function identifies clusters of improper reads that are linked by the mate information in paired read sequencing platforms such as Illumina HiSeq.

Usage

findCandidates2(preprocess, rp = RearrangementParams())

Arguments

`preprocess`	A list of preprocessing data as constructed by `preprocessData`
`rp`	A `RearrangementParams` object

Details

All reads from improper read pairs where mates are separated by at least 10kb and both reads in pair are mapped are read from the AlignmentViews object. A cluster of reads (all involved in improper pairs) is defined as follows:

genomic intervals demarcating improper read clusters are gotten by applying reduce to a GRanges representation of all improper reads
genomic intervals must be at least 115bp and no larger than 5000bp (default settings)
each cluster must contain at least 5 reads

Non-overlapping clusters that are linked by multiple improper read pairs are suggestive of a rearrangement. Linked tag clusters are identified by the function seqJunctionsInferredByPairedTags. The genomic intervals defined by the linked tag clusters (also referred to as linked bins) are represented as a GRanges object with a variable called linked.to in mcols. The linked.to column is also a GRanges object. The GRanges object of the linked clusters, the improper read pairs supporting the link, and the set of all tags that map to either linked genomic interval are encapsulated in a Rearrangement object. Statistics calculated on each Rearrangement object include the fraction of all reads link the two clusters (fractionLinkingTags), the types of rearrangements supported (rearrangementType), the modal rearrangement, and the percent of read pairs supporting the modal rearrangement. The collection of all linked clusters for a given sample is represented as a RearrangementList.

Examples

## Load list of preprocessed data (see preprocessData)
data(pdata, package="trellis")
## Parameters for finding candidate rearrangements
rparam <- RearrangementParams(min_number_tags_per_cluster=5,
                              rp_separation=10e3)
## List of candidate rearrangements
rlist <- findCandidates2(pdata, rparam)
rlist

cancer-genomics/trellis documentation built on Aug. 20, 2024, 5:48 p.m.