getClusters: Identify clusters containing high-confidence substitutions...

Description Usage Arguments Value Note Author(s) References See Also Examples

View source: R/getClusters.R

Description

Identifies clusters using the mini-rank norm (MRN) algorithm, which employs thresholding of background coverage differences and finds the optimal cluster boundaries by exhaustively evaluating all putative clusters using a rank-based approach. This method has higher sensitivity and an approximately 10-fold faster running time than the CWT-based cluster identification algorithm.

Usage

1
2
getClusters(highConfSub, coverage, sortedBam, cores =
1, threshold)

Arguments

highConfSub

GRanges object containing high-confidence substitution sites as returned by the getHighConfSub function

coverage

An Rle object containing the coverage at each genomic position as returned by a call to coverage

sortedBam

a GRanges object containing all aligned reads, including read sequence (qseq) and MD tag (MD), as returned by the readSortedBam function

cores

integer, the number of cores to be used for parallel evaluation. Default is 1.

threshold

numeric, the difference in coverage to be considered noise. If not specified, a Gaussian mixture model is used to learn a threshold from the data. Empirically, 10% of the minimum coverage required at substitutions (see argument minCov in the getHighConfSub function) might suffice to provide highly resolved clusters. However, if minCov is much lower than the median strand-specific coverage at substitutions m, which can be computed using summary(elementMetadata(highConfSub)[, 'coverage'])['Median']), 10% of m might represent an optimal choice.

Value

GRanges object containing the identified cluster boundaries.

Note

Clusters returned by this function need to be further merged by the function filterClusters, which also computes all relevant cluster statistics.

Author(s)

Federico Comoglio and Cem Sievers

References

Sievers C, Schlumpf T, Sawarkar R, Comoglio F and Paro R. (2012) Mixture models and wavelet transforms reveal high confidence RNA-protein interaction sites in MOV10 PAR-CLIP data, Nucleic Acids Res. 40(20):e160. doi: 10.1093/nar/gks697

Comoglio F, Sievers C and Paro R (2015) Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data, BMC Bioinformatics 16, 32.

See Also

getHighConfSub, filterClusters

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
filename <- system.file( "extdata", "example.bam", package = "wavClusteR" )
example <- readSortedBam( filename = filename )
countTable <- getAllSub( example, minCov = 10, cores = 1 )
highConfSub <- getHighConfSub( countTable, supportStart = 0.2, supportEnd = 0.7, substitution = "TC" )
coverage <- coverage( example )
clusters <- getClusters( highConfSub = highConfSub, 
                         coverage = coverage, 
                         sortedBam = example, 
	                 cores = 1, 
	                 threshold = 2 ) 

FedericoComoglio/wavClusteR documentation built on Oct. 29, 2020, 2:44 p.m.