distclu-functions: Private functions for distance clustering.

distclu-functionsR Documentation

Private functions for distance clustering.

Description

The flow of data is that a CTSS object of CTSSes is progressively deconstructed, and data to form the clusters is progressively integrated in a data.table object, which is finally converted to GRanges at the end. Doing the whole clustering with GRanges is more elegant, but looping on a GRangesList was just too slow. Maybe the operation on the data.table is more efficient because it is vectorised.

.cluster.ctss.strand does the strandless distance clustering of strandless CTSS positions from a single chromosome. Input does not need to be sorted, but pay attention that the output is sorted.

.cluster.ctss.chr does the stranded distance clustering of CTSS on a single chromosome, by dispatching both strands to .cluster.ctss.strand and merging the results, taking care keep the cluster IDs unique. Be careful that this function does not look at the score.

.ctss2clusters does the stranded distance clustering of CTSS.

.summarize.clusters calculates the number of CTSS, and the position and score of a main peak, for each cluster.

.distclu receives the data from the main clusterCTSS and dispatches each for (possibly parallel) processing.

Usage

.cluster.ctss.strand(ctss.ipos.chr, max.dist)

.cluster.ctss.chr(ctss.chr, max.dist)

.ctss2clusters(ctss, max.dist = 20, useMulticore = FALSE, nrCores = NULL)

.summarize.clusters(
  ctss.clustered,
  removeSingletons = FALSE,
  keepSingletonsAbove = Inf
)

.distclu(
  se,
  max.dist = 20,
  removeSingletons = FALSE,
  keepSingletonsAbove = Inf,
  useMulticore = FALSE,
  nrCores = NULL
)

Arguments

ctss.ipos.chr

A IPos object.

max.dist

See clusterCTSS().

ctss.chr

A CTSS.chr object.

ctss

A CTSS object with a score column.

useMulticore, nrCores

See clusterCTSS.

ctss.clustered

A data.table object representing the cluster ID (id), chromosome coordinates (chr, pos, strand) and the score of each CTSS.

removeSingletons

Remove “singleton” clusters that span only a single nucleotide ? (default = FALSE).

keepSingletonsAbove

Even if removeSingletons = TRUE, keep singletons when their score is aboove threshold (default = Inf).

se

A SummarizedExperiment object representing the CTSSes and their expression in each sample.

Value

.cluster.ctss.strand returns an data.table object containing arbitrary cluster IDs (as integers) for each CTSS.

.cluster.ctss.chr returns a data.table object representing the chromosome coordinates (chr, pos, strand) of each CTSS, with their cluster ID (id).

.ctss2clusters returns a data.table object representing the cluster ID (id), chromosome coordinates (chr, pos, strand) and the score of each CTSS.

.summarize.clusters returns GRanges describing the clusters.

.distclu returns GRanges describing the clusters.

Examples

# Get example data
library(IRanges)
library(GenomicRanges)

#.cluster.ctss.strand
ctss.ipos.chr <- IPos(c(1,3,4,12,14,25,28))
CAGEr:::.cluster.ctss.strand(ctss.ipos.chr, 5)

ctss.chr <- CTSScoordinatesGR(exampleCAGEexp)
ctss.chr <- ctss.chr[strand(ctss.chr) == "+"]
ctss.ipos.chr <- ranges(ctss.chr)
# Same result if not sorted
identical(
  CAGEr:::.cluster.ctss.strand(ctss.ipos.chr, 20),
  CAGEr:::.cluster.ctss.strand(ctss.ipos.chr[sample(seq_along(ctss.ipos.chr))], 20)
)
# Returns an emtpy data.table object if given an empty IRanges object.
identical(CAGEr:::.cluster.ctss.strand(IPos(), 20), data.table::data.table())

#.cluster.ctss.chr
ctss.chr <- as(CTSScoordinatesGR(exampleCAGEexp), "CTSS.chr")
CAGEr:::.cluster.ctss.chr(ctss.chr, 20)

# .ctss2clusters
ctss <- CTSScoordinatesGR(exampleCAGEexp)
score(ctss) <- CTSSnormalizedTpmDF(exampleCAGEexp)[[1]]
seqnames(ctss)[rep(c(TRUE,FALSE), length(ctss) / 2)] <- "chr16"
ctss
clusters <- CAGEr:::.ctss2clusters(ctss, 20)
clusters

# .summarize.clusters
CAGEr:::.summarize.clusters(clusters)
CAGEr:::.summarize.clusters(clusters, removeSingletons = TRUE)
CAGEr:::.summarize.clusters(clusters, removeSingletons = TRUE, keepSingletonsAbove = 5)

# .distclu
CAGEr:::.distclu(CTSStagCountSE(exampleCAGEexp))
## Not run: 
CAGEr:::.distclu(CTSStagCountSE(exampleCAGEexp), useMulticore = TRUE)

## End(Not run)

charles-plessy/CAGEr documentation built on Nov. 4, 2023, 11:57 a.m.