detectBreakpoints: Clustering and consensus breakpoint detection for chimeric...
In R453Plus1Toolbox: A package for importing and analyzing data from Roche's Genome Sequencer System

Description Usage Arguments Details Value Author(s) See Also Examples

Given a set of chimeric reads, this methods computes all putative breakpoints. First, chimeric reads are clustered such that all reads spanning the same breakpoint form a cluster. Then, a consensus breakpoint sequence and breakpoint position is computed for each cluster.

1	detectBreakpoints(chimericReads, bpDist=100, minClusterSize=4, removeSoftClips=TRUE, bsGenome)

`chimericReads`	A list storing chimeric reads as returned by `filterChimericReads`. The list must have the format as defined by the `scanBam` method.
`bpDist`	The maximum distance in base pairs between the breakpoints of two chimeric reads at which the reads are merge to a cluster.
`minClusterSize`	Cluster whose size is below minClusterSize are be excluded from breakpoint detection.
`removeSoftClips`	If true, soft-clipped bases at the beginning or the end of a sequence are removed (see details below).
`bsGenome`	A `bsGenome` instance providing the reference sequences. If missing, the library `BSgenome.Hsapiens.UCSC.hg19` is used by default.

This method is usually invoked after calling filterChimericReads and before calling mergeBreakpoints. It first forms clusters of chimeric reads (reads with exactly two local alignments) that span the same breakpoint and than computes a consensus breakpoint sequence for each cluster.

To carry out a hierarchical clustering, a measure for the distance between two chimeric reads must be defined. If reads span different chromosomes, their distance is set to infinity. The strand information of the local alignments may also indicate that two chimeric reads do not span the same breakpoint even if they span the same chromosomes. For example, the first reads has two local alignments on the positive strand whereas the second read has one local alignment on the positive strand and the other on the negative strand. In this case, the distance is set to inifinty, too. Finally, the distance measure distinguishes between the two breakpoints (sometimes called the pathogenic and the reciproce breakpoint) that originate from the same structual variant. The distance between a read from the pathogenic and a read from the reciproce breakpoint is infinity so that two different clusters will emerge. These two related breakpoints can be merge later using the mergeBreakpoints method. We observed that the breakpoints of these two cases often differ by a few ten or even a few hundred basepairs.

If the chromosome and strand information between two reads x and y are coherent, the Euclidian distance is used:

d(x,y) = (bp(x, ChrA) - bp(y, ChrA))^2 + (bp(x, ChrB) - bp(y, ChrB))^2

where bp gives the coordinates of the breakpoint for the given read and chromosome. Hierarchical clustering is applied with complete linkage and the dendrogram is cutted at a height of bpDist to obtain the final clusters. The bpDist argument does usually not influence the result, because we observed that reads spanning the same breakpoint have very little variation (only a few base pairs) in their local alignments due to sequencing errors or due to ambiguity caused by same/similar sequence of both chromosome near the breakpoint.

Although the given set of reads may belong to the same chimeric DNA, their individual breakpoints may differ in a few base pairs. Furthermore, a single read may have more than one possible breakpoint if a (small) part of the read was aligned to both parts.
The following step determines a consensus breakpoint for each cluster. It uses the supplied bsGenome to construct a chimeric reference sequence for all possible breakpoints over all reads within each cluster. After the reads were realigned to the chimeric reference sequences, the one that yields the highest alignment score is taken to represent best the chimeric DNA and its breakpoints.

As a preprocessing step, detectBreakpoints offers to remove soft clips occuring after the alignment:
Some reads may contain soft-clipped bases (e.g. linker sequences) at the beginning of the first part of the read or at the end of the second part. By default, detectBreakpoints removes these unaligned subsequences and adjusts the cigar string, the sequence, the sequence width (qwidth) and the local start/end coordinates.

detectBreakpoints returns an object of class breakpoints, which is a list of breakpoint clusters, which gives access to all alignments and consensus breakpoints:

`seqs`	This `IRanges` `DataFrame` is mainly a rearranged version of the alignment input in `chimericReads`. In addition, it shows the corresponding breakpoints and local alignment coordinates.
`commonBps`	A dataframe listing the breakpoints for both parts of the chimeric reference, the associated chromosome, strand and the reference sequence itself, including positions "localStart"/"localEnd" indicating which part of the reference belongs to which breakpoint.
`commonAlign`	An object of class `PairwiseAlignmentsSingleSubject` of the Biostrings package that contains the alignment to the (best) consensus reference sequence.
`alignedReads`	On the basis of `commonAlign` and `commonBps`, `alignedReads` is an instance of class `AlignedRead` containing all aligned reads including their associated chromosomes, strands, and positions. Since the reference is a chimeric sequence each read has two chromosome and two strand entries.

Hans-Ulrich Klein, Christoph Bartenhagen

filterChimericReads mergeBreakpoints plotChimericReads

# Construct a small example with three chimeric reads
# (=6 local alignments) in bam format as given by
# aligners such as BWA-SW.
# The first two reads originate from the same case but
# from different strands. The third read originate from
# the reciprocal breakpoint.
library("BSgenome.Scerevisiae.UCSC.sacCer2")
bamReads = list()
bamReads[[1]] = list(
    qname=c("seq1", "seq1", "seq2", "seq2", "seq3", "seq3"),
    flag = as.integer(c(0, 0, 16, 16, 0, 0)),
    rname = factor(c("II", "III", "III", "II", "III", "II")),
    strand = factor(c("+", "+", "-", "-", "+", "+")),
    pos = as.integer(c(99951, 200000, 200000, 99951, 199950, 100001)),
    qwidth = as.integer(c(100, 100, 100, 100, 100, 100)),
    cigar = c("50M50S","50S50M","50S50M","50M50S","50M50S", "50S50M"),
    seq = DNAStringSet(c(
        paste(substr(Scerevisiae$chrII, start=99951, stop=100000),
              substr(Scerevisiae$chrIII, start=200000, stop=200049),
              sep=""),
        paste(substr(Scerevisiae$chrII, start=99951, stop=100000),
              substr(Scerevisiae$chrIII, start=200000, stop=200049),
              sep=""),
        paste(substr(Scerevisiae$chrIII, start=200000, stop=200049),
              substr(Scerevisiae$chrII, start=99951, stop=100000),
              sep=""),
        paste(substr(Scerevisiae$chrIII, start=200000, stop=200049),
              substr(Scerevisiae$chrII, start=99951, stop=100000),
              sep=""),
        paste(substr(Scerevisiae$chrIII, start=199950, stop=199999),
              substr(Scerevisiae$chrII, start=100001, stop=100050),
              sep=""),
        paste(substr(Scerevisiae$chrIII, start=199950, stop=199999),
              substr(Scerevisiae$chrII, start=100001, stop=100050),
              sep="")))
)

bps = detectBreakpoints(bamReads, minClusterSize=1, bsGenome=Scerevisiae)
summary(bps)
table(bps)

mergeBreakpoints(bps)

R453Plus1Toolbox documentation built on Nov. 8, 2020, 5:59 p.m.

R453Plus1Toolbox index

A package for importing and analyzing data from Roche's Genome Sequencer System

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

R453Plus1Toolbox
A package for importing and analyzing data from Roche's Genome Sequencer System

detectBreakpoints: Clustering and consensus breakpoint detection for chimeric...
In R453Plus1Toolbox: A package for importing and analyzing data from Roche's Genome Sequencer System

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to detectBreakpoints in R453Plus1Toolbox...

R Package Documentation

Browse R Packages

We want your feedback!

R453Plus1Toolbox A package for importing and analyzing data from Roche's Genome Sequencer System

detectBreakpoints: Clustering and consensus breakpoint detection for chimeric... In R453Plus1Toolbox: A package for importing and analyzing data from Roche's Genome Sequencer System

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to detectBreakpoints in R453Plus1Toolbox...

R Package Documentation

Browse R Packages

We want your feedback!

R453Plus1Toolbox
A package for importing and analyzing data from Roche's Genome Sequencer System

detectBreakpoints: Clustering and consensus breakpoint detection for chimeric...
In R453Plus1Toolbox: A package for importing and analyzing data from Roche's Genome Sequencer System