Description Usage Arguments Details Value Author(s) References See Also Examples
Normalizes raw CAGE tag count per CTSS in all experiments to a same referent distribution. A simple tag per million normalization or normalization to a referent power-law distribution (Balwierz et al., Genome Biology 2009) can be specified.
1 2 | normalizeTagCount(object, method = "powerLaw", fitInRange = c(10, 1000),
alpha = 1.25, T = 10^6)
|
object |
A |
method |
Method to be used for normalization. Can be either |
fitInRange |
An integer vector with two values specifying a range of tag count values to be used for fitting a power-law distribution to reverse cumulatives. Used only when |
alpha |
|
T |
Total number of CAGE tags in the referent power-law distribution. Setting |
It has been shown that many CAGE datasets follow a power-law distribution (Balwierz et al., Genome Biology 2009). Plotting the number of CAGE tags (X-axis) against the number of TSSs that are supported by >= of that number of tags (Y-axis) results in a distribution that can be approximated by a power-law. On a log-log scale this theoretical referent distribution can be described by a monotonically decreasing linear function y = -1 * alpha * x + beta
, which is fully determined by the slope alpha
and total number of tags T
(which together with alpha
determines the value of beta
). Thus, by specifying parameters alpha
and T
a desired referent power-law distribution can be selected. However, real CAGE datasets deviate from the power-law in the areas of very low and very high number of tags, so it is advisable to discard these areas before fitting a power-law distribution. fitInRange
parameter allows to specify a range of values (lower and upper limit of the number of CAGE tags) that will be used to fit a power-law. Plotting reverse cumulatives using plotReverseCumulatives
function can help in choosing the best range of values. After fitting a power-law distribution to each CAGE dataset individually, all datasets are normalized to a referent distribution specified by alpha
and T
. When T = 10^6
, normalized values are expressed as tags per million (tpm).
The slot normalizedTpmMatrix
of the provided CAGEset
object will be occupied by normalized CAGE signal values per CTSS across all experiments, or with the raw tag counts (in case method = "none"
).
Vanja Haberle
Balwierz et al. (2009) Methods for analyzing deep sequencing expression data: constructing the human and mouse promoterome with deepCAGE data, Genome Biology 10(7):R79.
1 2 3 | load(system.file("data", "exampleCAGEset.RData", package="CAGEr"))
normalizeTagCount(exampleCAGEset, method = "powerLaw")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.