paraclu: Parametric clustering

paracluR Documentation

Parametric clustering

Description

"paraclu" is an implementation of Paraclu algorithm for parametric clustering of data attached to sequences (Frith et al., Genome Research, 2007). Since Paraclu finds clusters within clusters (unlike distclu), additional parameters (minStability, maxLength and reduceToNonoverlapping) can be specified to simplify the output by discarding too big clusters, and to reduce the clusters to a final set of non-overlapping clusters.

Usage

paraclu(
  object,
  minStability = 1,
  maxLength = 500,
  keepSingletonsAbove = 0,
  reduceToNonoverlapping = TRUE,
  useMulticore = FALSE,
  nrCores = NULL
)

## S4 method for signature 'Pairs'
paraclu(
  object,
  minStability = 1,
  maxLength = 500,
  keepSingletonsAbove = 0,
  reduceToNonoverlapping = TRUE,
  useMulticore = FALSE,
  nrCores = NULL
)

## S4 method for signature 'CTSS'
paraclu(
  object,
  minStability = 1,
  maxLength = 500,
  keepSingletonsAbove = 0,
  reduceToNonoverlapping = TRUE,
  useMulticore = FALSE,
  nrCores = NULL
)

## S4 method for signature 'GRanges'
paraclu(
  object,
  minStability = 1,
  maxLength = 500,
  keepSingletonsAbove = 0,
  reduceToNonoverlapping = TRUE,
  useMulticore = FALSE,
  nrCores = NULL
)

## S4 method for signature 'SummarizedExperiment'
paraclu(
  object,
  minStability = 1,
  maxLength = 500,
  keepSingletonsAbove = 0,
  reduceToNonoverlapping = TRUE,
  useMulticore = FALSE,
  nrCores = NULL
)

## S4 method for signature 'CAGEexp'
paraclu(
  object,
  minStability = 1,
  maxLength = 500,
  keepSingletonsAbove = 0,
  reduceToNonoverlapping = TRUE,
  useMulticore = FALSE,
  nrCores = NULL
)

Arguments

object

A CTSS, or a S4Vectors::Pairs object with positions first and scores second.

minStability

Minimal stability of the cluster, where stability is defined as ratio between maximal and minimal density value for which this cluster is maximal scoring. For definition of stability refer to Frith et al., Genome Research, 2007. Clusters with stability ⁠< minStability⁠ will be discarded.

maxLength

Maximal length of cluster in base-pairs. Clusters with length ⁠> maxLength⁠ will be discarded.

keepSingletonsAbove

Remove "singleton" tag clusters of width 1 with signal ⁠< keepSingletonsAbove⁠. Default value 0 results in keeping all TCs by default. Setting it to Inf removes all singletons.

reduceToNonoverlapping

Logical, should smaller clusters contained within bigger cluster be removed to make a final set of tag clusters non-overlapping.

useMulticore

Logical, should multicore be used. useMulticore = TRUE has no effect on non-Unix-like platforms.

nrCores

Number of cores to use when useMulticore = TRUE. Default value NULL uses all detected cores.

Details

Clustering is done for every CAGE dataset within the CAGEr object separately, resulting in a different set of tag clusters for every CAGE dataset. TCs from different datasets can further be aggregated into a single referent set of consensus clusters by calling the aggregateTagClusters function.

Value

Running Paraclu on a Pairs object containing positions and scores returns an IRanges object containing the start and end positions of the clusters, as well as the minimum and maximum density in min_d and max_d metadata columns.

Running Paraclu on a CTSS object dispatches the computation on each strand of each sequence level of the object, collects the IRanges and assemble them back in a TagClusters object after filtering them by size and by expression following the minStability, maxLength, keepSingletonsAbove and reduceToNonoverlapping parameters.

Running Paraclu on a RangedSummarizedExperiment object will loop on each sample, and return the results as a GRangesList of TagClusters.

Running Paraclu on a CAGEexp returnts is with the clusters stored as a GRangesList of TagClusters objects in its metadata slot tagClusters.

Author(s)

Vanja Haberle

Charles Plessy

References

MC Frith, E Valen, A Krogh, Y Hayashizaki, P Carninci, A Sandelin. A code for transcription initiation in mammalian genomes. Genome Research 2008 18(1):1-12)

See Also

aggregateTagClusters

Other CAGEr clustering methods: consensusClustersTpm(), distclu()

Other CAGEr object modifiers: CTSStoGenes(), CustomConsensusClusters(), aggregateTagClusters(), annotateCTSS(), cumulativeCTSSdistribution(), distclu(), getCTSS(), normalizeTagCount(), quantilePositions(), quickEnhancers(), resetCAGEexp(), summariseChrExpr()

Other CAGEr clusters functions: CTSScumulativesTagClusters(), CustomConsensusClusters(), aggregateTagClusters(), consensusClustersDESeq2(), consensusClustersGR(), cumulativeCTSSdistribution(), distclu(), plotInterquantileWidth(), quantilePositions(), tagClustersGR()

Examples

(ctss <- CTSSnormalizedTpmGR(exampleCAGEexp,1))
(pair <- Pairs(pos(ctss), score(ctss)))
CAGEr:::.paraclu_params(first(pair), second(pair))
CAGEr:::.paraclu(first(pair)[1:10], second(pair)[1:10])
paraclu(pair[1:10])
paraclu(ctss[1:10])
paraclu(CTSStagCountSE(exampleCAGEexp)[1:25,])
ce <- paraclu( exampleCAGEexp,
             , keepSingletonsAbove = 100
             , maxLength = 500, minStability = 1
             , reduceToNonoverlapping = TRUE)
tagClustersGR(ce, "Zf.30p.dome")


charles-plessy/CAGEr documentation built on Aug. 2, 2024, 4:35 p.m.