getClusters: Identify clusters containing high-confidence substitutions...

Description Usage Arguments Value Note Author(s) References See Also Examples

View source: R/getClusters.R

Description

Identifies clusters using either the mini-rank norm (MRN) algorithm (default and recommended to achieve highest sensitivity) or via a continuous wavelet transform (CWT) based approach. The former employs thresholding of background coverage differences and finds the optimal cluster boundaries by exhaustively evaluating all putative clusters using a rank-based approach. This method has higher sensitivity and an approximately 10-fold faster running time than the CWT-based cluster identification algorithm. The latter, maintained for compatibility with wavClusteR, computes the CWT on a 1 kb window of the coverage function centered at a high-confidence substitution site, and identifies cluster boundaries by extending away from peak positions.

Usage

1
2
getClusters(highConfSub, coverage, sortedBam, method = 'mrn', cores =
1, threshold, step = 1, snr = 3)

Arguments

highConfSub

GRanges object containing high-confidence substitution sites as returned by the getHighConfSub function

coverage

An Rle object containing the coverage at each genomic position as returned by a call to coverage

sortedBam

a GRanges object containing all aligned reads, including read sequence (qseq) and MD tag (MD), as returned by the readSortedBam function

method

a character, either set to "mrn" or to "cwt" to compute clusters using the mini-rank norm or the wavelet transform-based algorithm, respectively. Default is "mrn" (recommended).

cores

integer, the number of cores to be used for parallel evaluation. Default is 1.

threshold

numeric, if method = "mrn", the difference in coverage to be considered noise. If not specified, a Gaussian mixture model is used to learn a threshold from the data. Empirically, 10% of the minimum coverage required at substitutions (see argument minCov in the getHighConfSub function) might suffice to provide highly resolved clusters. However, if minCov is much lower than the median strand-specific coverage at substitutions m, which can be computed using summary(elementMetadata(highConfSub)[, 'coverage'])['Median']), 10% of m might represent an optimal choice.

step

numeric, if method = "cwt", step size of window shift. If two high-confidence substitution sites are located within a distance less than step, the wavelet transform is computed only once. Default: 1, i.e. each high-confidence substitution site is considered independently.

snr

numeric, if method = "cwt", signal-to-noise ratio controlling the peak calling as performed by wavCWTPeaks implemented in the wmtsa package. Default: 3.

Value

GRanges object containing the identified cluster boundaries.

Note

Clusters returned by this function need to be further merged by the function filterClusters, which also computes all relevant cluster statistics.

Author(s)

Federico Comoglio and Cem Sievers

References

William Constantine and Donald Percival (2011), wmtsa: Wavelet Methods for Time Series Analysis, http://CRAN.R-project.org/package=wmtsa

Sievers C, Schlumpf T, Sawarkar R, Comoglio F and Paro R. (2012) Mixture models and wavelet transforms reveal high confidence RNA-protein interaction sites in MOV10 PAR-CLIP data, Nucleic Acids Res. 40(20):e160. doi: 10.1093/nar/gks697

Comoglio F, Sievers C and Paro R (2015) Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data, BMC Bioinformatics 16, 32.

See Also

getHighConfSub, filterClusters

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
filename <- system.file( "extdata", "example.bam", package = "wavClusteR" )
example <- readSortedBam( filename = filename )
countTable <- getAllSub( example, minCov = 10, cores = 1 )
highConfSub <- getHighConfSub( countTable, supportStart = 0.2, supportEnd = 0.7, substitution = "TC" )
coverage <- coverage( example )
clusters <- getClusters( highConfSub = highConfSub, 
                         coverage = coverage, 
                         sortedBam = example, 
	                 method = 'mrn', 
	                 cores = 1, 
	                 threshold = 2 ) 

Bioconductor-mirror/wavClusteR documentation built on June 1, 2017, 8:10 p.m.