normalizeTagCount | R Documentation |
Normalizes raw CAGE tag count per CTSS in all experiments to a same referent distribution. A simple tag per million normalization or normalization to a referent power-law distribution (Balwierz et al., Genome Biology 2009) can be specified.
normalizeTagCount(
object,
method = c("powerLaw", "simpleTpm", "none"),
fitInRange = c(10, 1000),
alpha = 1.25,
T = 10^6
)
## S4 method for signature 'CAGEexp'
normalizeTagCount(
object,
method = c("powerLaw", "simpleTpm", "none"),
fitInRange = c(10, 1000),
alpha = 1.25,
T = 10^6
)
object |
A |
method |
Method to be used for normalization. Can be either |
fitInRange |
An integer vector with two values specifying a range of tag count
values to be used for fitting a power-law distribution to reverse cumulatives.
Used only when |
alpha |
|
T |
Total number of CAGE tags in the referent power-law distribution. Setting
|
It has been shown that many CAGE datasets follow a power-law distribution
(Balwierz et al., Genome Biology 2009). Plotting the number of CAGE tags
(X-axis) against the number of TSSs that are supported by >= of that number of tags
(Y-axis) results in a distribution that can be approximated by a power-law. On a
log-log scale this theoretical referent distribution can be described by a
monotonically decreasing linear function y = -1 * alpha * x + beta
, which is
fully determined by the slope alpha
and total number of tags T
(which
together with alpha
determines the value of beta
). Thus, by specifying
parameters alpha
and T
a desired referent power-law distribution can be
selected. However, real CAGE datasets deviate from the power-law in the areas of very
low and very high number of tags, so it is advisable to discard these areas before
fitting a power-law distribution. fitInRange
parameter allows to specify a
range of values (lower and upper limit of the number of CAGE tags) that will be used to
fit a power-law. Plotting reverse cumulatives using plotReverseCumulatives
function can help in choosing the best range of values. After fitting a power-law
distribution to each CAGE dataset individually, all datasets are normalized to a
referent distribution specified by alpha
and T
. When T = 10^6
,
normalized values are expressed as tags per million (tpm).
The slot normalizedTpmMatrix
of the provided CAGEexp
object
will be occupied by normalized CAGE signal values per CTSS across all
experiments, or with the raw tag counts (in case method = "none"
).
Vanja Haberle
Balwierz et al. (2009) Methods for analyzing deep sequencing expression data: constructing the human and mouse promoterome with deepCAGE data, Genome Biology 10(7):R79.
plotReverseCumulatives
, CTSSnormalizedTpmDF
Other CAGEr object modifiers:
CTSStoGenes()
,
CustomConsensusClusters()
,
aggregateTagClusters()
,
annotateCTSS()
,
cumulativeCTSSdistribution()
,
distclu()
,
getCTSS()
,
paraclu()
,
quantilePositions()
,
quickEnhancers()
,
resetCAGEexp()
,
summariseChrExpr()
Other CAGEr normalised data functions:
plotReverseCumulatives()
ce1 <- normalizeTagCount(exampleCAGEexp, method = "simpleTpm")
ce2 <- normalizeTagCount(exampleCAGEexp, method = "powerLaw")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.