distclu-functions: Private functions for distance clustering.
In CAGEr: Analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining

Description Usage Arguments Value Examples

The flow of data is that a CTSS object of CTSSes is progressively deconstructed, and data to form the clusters is progressively integrated in a data.table object, which is finally converted to GRanges at the end. Doing the whole clustering with GRanges is more elegant, but looping on a GRangesList was just too slow. Maybe the operation on the data.table is more efficient because it is vectorised.

.cluster.ctss.strand does the strandless distance clustering of strandless CTSS positions from a single chromosome. Input does not need to be sorted, but pay attention that the output is sorted.

.cluster.ctss.chr does the stranded distance clustering of CTSS on a single chromosome, by dispatching both strands to .cluster.ctss.strand and merging the results, taking care keep the cluster IDs unique. Be careful that this function does not look at the score.

.ctss2clusters does the stranded distance clustering of CTSS.

.summarize.clusters calculates the number of CTSS, and the position and score of a main peak, for each cluster.

.distclu receives the data from the main clusterCTSS and dispatches each for (possibly parallel) processing.

.cluster.ctss.strand(ctss.ipos.chr, max.dist)

.cluster.ctss.chr(ctss.chr, max.dist)

.ctss2clusters(ctss, max.dist = 20, useMulticore = FALSE, nrCores = NULL)

.summarize.clusters(
  ctss.clustered,
  removeSingletons = FALSE,
  keepSingletonsAbove = Inf
)

.distclu(
  se,
  max.dist = 20,
  removeSingletons = FALSE,
  keepSingletonsAbove = Inf,
  useMulticore = FALSE,
  nrCores = NULL
)

`ctss.ipos.chr`	A IPos object.
`max.dist`	See `clusterCTSS()`.
`ctss.chr`	A CTSS.chr object.
`ctss`	A CTSS object with a score column.
`useMulticore, nrCores`	See clusterCTSS.
`ctss.clustered`	A `data.table` object representing the cluster ID (`id`), chromosome coordinates (`chr`, `pos`, `strand`) and the `score` of each CTSS.
`removeSingletons`	Remove “singleton” clusters that span only a single nucleotide ? (default = FALSE).
`keepSingletonsAbove`	Even if `removeSingletons = TRUE`, keep singletons when their score is aboove threshold (default = `Inf`).
`se`	A `SummarizedExperiment` object representing the CTSSes and their expression in each sample.

.cluster.ctss.strand returns an data.table object containing arbitrary cluster IDs (as integers) for each CTSS.

.cluster.ctss.chr returns a data.table object representing the chromosome coordinates (chr, pos, strand) of each CTSS, with their cluster ID (id).

.ctss2clusters returns a data.table object representing the cluster ID (id), chromosome coordinates (chr, pos, strand) and the score of each CTSS.

.summarize.clusters returns GRanges describing the clusters.

.distclu returns GRanges describing the clusters.

# Get example data
library(IRanges)
library(GenomicRanges)

#.cluster.ctss.strand
ctss.ipos.chr <- IPos(c(1,3,4,12,14,25,28))
CAGEr:::.cluster.ctss.strand(ctss.ipos.chr, 5)

ctss.chr <- CTSScoordinatesGR(exampleCAGEexp)
ctss.chr <- ctss.chr[strand(ctss.chr) == "+"]
ctss.ipos.chr <- ranges(ctss.chr)
# Same result if not sorted
identical(
  CAGEr:::.cluster.ctss.strand(ctss.ipos.chr, 20),
  CAGEr:::.cluster.ctss.strand(ctss.ipos.chr[sample(seq_along(ctss.ipos.chr))], 20)
)
# Returns an emtpy data.table object if given an empty IRanges object.
identical(CAGEr:::.cluster.ctss.strand(IPos(), 20), data.table::data.table())

#.cluster.ctss.chr
ctss.chr <- as(CTSScoordinatesGR(exampleCAGEexp), "CTSS.chr")
CAGEr:::.cluster.ctss.chr(ctss.chr, 20)

# .ctss2clusters
ctss <- CTSScoordinatesGR(exampleCAGEexp)
score(ctss) <- CTSSnormalizedTpmDF(exampleCAGEexp)[[1]]
seqnames(ctss)[rep(c(TRUE,FALSE), length(ctss) / 2)] <- "chr16"
ctss
clusters <- CAGEr:::.ctss2clusters(ctss, 20)
clusters

# .summarize.clusters
CAGEr:::.summarize.clusters(clusters)
CAGEr:::.summarize.clusters(clusters, removeSingletons = TRUE)
CAGEr:::.summarize.clusters(clusters, removeSingletons = TRUE, keepSingletonsAbove = 5)

# .distclu
CAGEr:::.distclu(CTSStagCountSE(exampleCAGEexp))
## Not run: 
CAGEr:::.distclu(CTSStagCountSE(exampleCAGEexp), useMulticore = TRUE)

## End(Not run)

CAGEr:::.distclu(CTSStagCountSE(exampleCAGEset))

CAGEr documentation built on Jan. 17, 2021, 2 a.m.

CAGEr index

Package overview CAGEr: an R package for CAGE (Cap Analysis of Gene Expression) data analysis and promoterome mining Use of CAGE resources with CAGEr

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

CAGEr
Analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining

distclu-functions: Private functions for distance clustering.
In CAGEr: Analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining

Description

Usage

Arguments

Value

Examples

Related to distclu-functions in CAGEr...

R Package Documentation

Browse R Packages

We want your feedback!

CAGEr Analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining

distclu-functions: Private functions for distance clustering. In CAGEr: Analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining

Description

Usage

Arguments

Value

Examples

Related to distclu-functions in CAGEr...

R Package Documentation

Browse R Packages

We want your feedback!

CAGEr
Analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining

distclu-functions: Private functions for distance clustering.
In CAGEr: Analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining