Normalizes raw CAGE tag count per CTSS in all experiments to a same referent distribution. A simple tag per million normalization or normalization to a referent power-law distribution (Balwierz et al., Genome Biology 2009) can be specified.
1 2 3 4 5 6 7 8 9 10 11 12
normalizeTagCount(object, method = c("powerLaw", "simpleTpm", "none"), fitInRange = c(10, 1000), alpha = 1.25, T = 10^6) ## S4 method for signature 'CAGEset' normalizeTagCount(object, method = c("powerLaw", "simpleTpm", "none"), fitInRange = c(10, 1000), alpha = 1.25, T = 10^6) ## S4 method for signature 'CAGEexp' normalizeTagCount(object, method = c("powerLaw", "simpleTpm", "none"), fitInRange = c(10, 1000), alpha = 1.25, T = 10^6)
Method to be used for normalization. Can be either
An integer vector with two values specifying a range of tag count
values to be used for fitting a power-law distribution to reverse cumulatives.
Used only when
Total number of CAGE tags in the referent power-law distribution. Setting
It has been shown that many CAGE datasets follow a power-law distribution
(Balwierz et al., Genome Biology 2009). Plotting the number of CAGE tags
(X-axis) against the number of TSSs that are supported by >= of that number of tags
(Y-axis) results in a distribution that can be approximated by a power-law. On a
log-log scale this theoretical referent distribution can be described by a
monotonically decreasing linear function
y = -1 * alpha * x + beta, which is
fully determined by the slope
alpha and total number of tags
alpha determines the value of
beta). Thus, by specifying
T a desired referent power-law distribution can be
selected. However, real CAGE datasets deviate from the power-law in the areas of very
low and very high number of tags, so it is advisable to discard these areas before
fitting a power-law distribution.
fitInRange parameter allows to specify a
range of values (lower and upper limit of the number of CAGE tags) that will be used to
fit a power-law. Plotting reverse cumulatives using
function can help in choosing the best range of values. After fitting a power-law
distribution to each CAGE dataset individually, all datasets are normalized to a
referent distribution specified by
T = 10^6,
normalized values are expressed as tags per million (tpm).
normalizedTpmMatrix of the provided
will be occupied by normalized CAGE signal values per CTSS across all
experiments, or with the raw tag counts (in case
method = "none").
Balwierz et al. (2009) Methods for analyzing deep sequencing expression data: constructing the human and mouse promoterome with deepCAGE data, Genome Biology 10(7):R79.
1 2 3 4
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.