CAGE data based expression clustering

Share:

Description

Performs clustering of CAGE derived expression across multiple experiments, both at level of individual TSSs or entire clusters of TSSs.

Usage

1
2
3
getExpressionProfiles(object, what, tpmThreshold = 5, 
                      nrPassThreshold = 1, method = "som", 
                      xDim = 5, yDim = 5)

Arguments

object

A CAGEset object

what

At which level should the expression clustering be done. Can be either "CTSS" to perform clustering of individual CTSSs or "consensusClusters" to perform clustering of consensus clusters. See Details.

tpmThreshold, nrPassThreshold

Only CTSSs or consensus clusters (depending on what parameter) with normalized CAGE signal >= tpmThreshold in >= nrPassThreshold experiments will be included in expression clustering.

method

Method to be used for expression clustering. Can be either "som" to use the self-organizing map (SOM) algorithm (Toronen et al., FEBS Letters 1999) implemented in the the som function from som package, or "kmeans" to use the K-means algorithm implemented in the kmeans function from stats package.

xDim, yDim

When method = "kmeans", xDim specifies number of clusters that will be returned by K-means algorithm and yDim is ignored. When method = "som", xDim specifies the the first and yDim the second dimension of the self-organizing map, which results in total xDim * yDim clusters returned by SOM.

Details

Expression clustering can be done at level of individual CTSSs, in which case the feature vector used as input for clustering algorithm contains log-transformed and scaled (divided by standard deviation) normalized CAGE signal at individual TSS across multiple experiments. Only TSSs with normalized CAGE signal >= tpmThreshold in at least nrPassThreshold CAGE experiments are used for expression clustering. However, CTSSs along the genome can be spatially clustered into tag clusters for each experiment separately using the clusterCTSS function, and then aggregated across experiments into consensus clusters using aggregateTagClusters function. Once the consensus clusters have been created, expression clustering at the level of these wider genomic regions (representing entire promoters rather than individual TSSs) can be performed. In that case the feature vector used as input for clustering algorithm contains normalized CAGE signal within entire consensus cluster across multiple experiments, and threshold values in tpmThreshold and nrPassThreshold are applied to entire consensus clusters.

Value

If what = "CTSS" the slots CTSSexpressionClusteringMethod and CTSSexpressionClasses will be occupied, and if what = "consensusClusters" the slots consensusClustersExpressionClusteringMethod and consensusClustersExpressionClasses of the provided CAGEset object will be occupied with the results of expression clustering. Labels of expression classes (clusters) can be retrieved using expressionClasses function, and elements belonging to a specific expression class can be selected using extractExpressionClass function.

Author(s)

Vanja Haberle

References

Toronen et al. (1999) Analysis of gene expression data using self-organizing maps, FEBS Letters 451:142-146.

See Also

plotExpressionProfiles
expressionClasses
extractExpressionClass

Examples

1
2
3
4
load(system.file("data", "exampleCAGEset.RData", package="CAGEr"))

getExpressionProfiles(object = exampleCAGEset, what = "CTSS",
tpmThreshold = 50, nrPassThreshold = 1, method = "som", xDim = 3, yDim = 3)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.