clusterCTSS: Cluster CTSSs into tag clusters

clusterCTSSR Documentation

Cluster CTSSs into tag clusters


Clusters individual CAGE transcription start sites (CTSSs) along the genome into tag clusters (TCs) using specified ab initio method, or assigns them to predefined genomic regions.


  threshold = 1,
  nrPassThreshold = 1,
  thresholdIsTpm = TRUE,
  method = c("distclu", "paraclu", "custom"),
  maxDist = 20,
  removeSingletons = FALSE,
  keepSingletonsAbove = Inf,
  minStability = 1,
  maxLength = 500,
  reduceToNonoverlapping = TRUE,
  customClusters = NULL,
  useMulticore = FALSE,
  nrCores = NULL

## S4 method for signature 'CAGEexp'
  threshold = 1,
  nrPassThreshold = 1,
  thresholdIsTpm = TRUE,
  method = c("distclu", "paraclu", "custom"),
  maxDist = 20,
  removeSingletons = FALSE,
  keepSingletonsAbove = Inf,
  minStability = 1,
  maxLength = 500,
  reduceToNonoverlapping = TRUE,
  customClusters = NULL,
  useMulticore = FALSE,
  nrCores = NULL



A CAGEr object.

threshold, nrPassThreshold

Ignore CTSSs with signal ⁠< threshold⁠ in ⁠< nrPassThreshold⁠ experiments.


Logical indicating if threshold is expressed in raw tag counts (FALSE) or normalized signal (TRUE).


Method to be used for clustering: "distclu", "paraclu" or "custom". See Details.


Maximal distance between two neighbouring CTSSs for them to be part of the same cluster. Used only when method = "distclu", otherwise ignored.


Logical indicating if tag clusters containing only one CTSS be removed. Ignored when method = "custom".


Controls which singleton tag clusters will be removed. When removeSingletons = TRUE, only singletons with signal ⁠< keepSingletonsAbove⁠ will be removed. Useful to prevent removing highly supported singleton tag clusters. Default value Inf results in removing all singleton TCs when removeSingletons = TRUE. Ignored when method = "custom".


Minimal stability of the cluster, where stability is defined as ratio between maximal and minimal density value for which this cluster is maximal scoring. For definition of stability refer to Frith et al., Genome Research, 2007. Clusters with stability ⁠< minStability⁠ will be discarded. Used only when method = "paraclu".


Maximal length of cluster in base-pairs. Clusters with length ⁠> maxLength⁠ will be discarded. Ignored when method = "custom".


Logical, should smaller clusters contained within bigger cluster be removed to make a final set of tag clusters non-overlapping. Used only method = "paraclu".


Genomic coordinates of predefined regions to be used to segment the CTSSs. The format is either a GRanges object or a data.frame with the following columns: chr (chromosome name), start (0-based start coordinate), end (end coordinate), strand (either "+", or "-"). Used only when method = "custom".


Logical, should multicore be used. useMulticore = TRUE has no effect on non-Unix-like platforms.


Number of cores to use when useMulticore = TRUE. Default value NULL uses all detected cores.


The "distclu" method is an implementation of simple distance-based clustering of data attached to sequences, where two neighbouring TSSs are joined together if they are closer than some specified distance (see distclu-functions for implementation details.

"paraclu" is an implementation of Paraclu algorithm for parametric clustering of data attached to sequences (Frith et al., Genome Research, 2007). Since Paraclu finds clusters within clusters (unlike distclu), additional parameters (removeSingletons, keepSingletonsAbove, minStability, maxLength and reduceToNonoverlapping) can be specified to simplify the output by discarding too small (singletons) or too big clusters, and to reduce the clusters to a final set of non-overlapping clusters.

Clustering is done for every CAGE dataset within the CAGEr object separately, resulting in a different set of tag clusters for every CAGE dataset. TCs from different datasets can further be aggregated into a single referent set of consensus clusters by calling the aggregateTagClusters function.


Returns the CAGEexp object, in which, the results will be stored as a GRangesList of TagClusters objects in the metadata slot tagClusters. The TagClusters objects will contain a filteredCTSSidx column if appropriate. The clustering method name is saved in the metadata slot of the GRangesList.


Vanja Haberle


Frith et al. (2007) A code for transcription initiation in mammalian genomes, Genome Research 18(1):1-12, (

See Also

tagClustersGR, aggregateTagClusters and CTSSclusteringMethod.

Other CAGEr object modifiers: CTSStoGenes(), CustomConsensusClusters(), aggregateTagClusters(), annotateCTSS(), cumulativeCTSSdistribution(), getCTSS(), normalizeTagCount(), quantilePositions(), quickEnhancers(), resetCAGEexp(), summariseChrExpr()

Other CAGEr clusters functions: CTSSclusteringMethod(), CTSScumulativesTagClusters(), CustomConsensusClusters(), aggregateTagClusters(), consensusClustersDESeq2(), consensusClustersGR(), cumulativeCTSSdistribution(), plotInterquantileWidth(), quantilePositions(), tagClustersGR()


# Using 'distclu', notice argument 'maxDist'
ce <- clusterCTSS( exampleCAGEexp, threshold = 50, thresholdIsTpm = TRUE
           , nrPassThreshold = 1, method = "distclu", maxDist = 20
           , removeSingletons = TRUE, keepSingletonsAbove = 100)
tagClustersGR(ce, "Zf.30p.dome")

# Using 'paraclu', notice arguments 'maxLength' and 'minStability'
ce <- clusterCTSS( exampleCAGEexp, threshold = 50, thresholdIsTpm = TRUE
           , nrPassThreshold = 1, method = "paraclu"
           , removeSingletons = TRUE, keepSingletonsAbove = 100
           , maxLength = 500, minStability = 1
           , reduceToNonoverlapping = TRUE)
tagClustersGR(ce, "Zf.30p.dome")

charles-plessy/CAGEr documentation built on Nov. 4, 2023, 11:57 a.m.