normalizeTagCount: Normalizing raw CAGE tag count
In charles-plessy/CAGEr: Analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining

normalizeTagCount

R Documentation

Normalizing raw CAGE tag count

Description

Normalizes raw CAGE tag count per CTSS in all experiments to a same referent distribution. A simple tag per million normalization or normalization to a referent power-law distribution (Balwierz et al., Genome Biology 2009) can be specified.

Usage

normalizeTagCount(
  object,
  method = c("powerLaw", "simpleTpm", "none"),
  fitInRange = c(10, 1000),
  alpha = 1.25,
  T = 10^6
)

## S4 method for signature 'CAGEexp'
normalizeTagCount(
  object,
  method = c("powerLaw", "simpleTpm", "none"),
  fitInRange = c(10, 1000),
  alpha = 1.25,
  T = 10^6
)

Arguments

`object`	A `CAGEexp` object
`method`	Method to be used for normalization. Can be either `"simpleTpm"` to convert tag counts to tags per million or `"powerLaw"` to normalize to a referent power-law distribution, or `"none"` to keep using the raw tag counts in downstream analyses.
`fitInRange`	An integer vector with two values specifying a range of tag count values to be used for fitting a power-law distribution to reverse cumulatives. Used only when `method = "powerLaw"`, otherwise ignored. See Details.
`alpha`	`-1 * alpha` will be the slope of the referent power-law distribution in the log-log representation. Used only when `method = "powerLaw"`, otherwise ignored. See Details.
`T`	Total number of CAGE tags in the referent power-law distribution. Setting `T = 10^6` results in normalized values that correspond to tags per million in the referent distribution. Used only when `method = "powerLaw"`, otherwise ignored. See Details.

Details

It has been shown that many CAGE datasets follow a power-law distribution (Balwierz et al., Genome Biology 2009). Plotting the number of CAGE tags (X-axis) against the number of TSSs that are supported by >= of that number of tags (Y-axis) results in a distribution that can be approximated by a power-law. On a log-log scale this theoretical referent distribution can be described by a monotonically decreasing linear function y = -1 * alpha * x + beta, which is fully determined by the slope alpha and total number of tags T (which together with alpha determines the value of beta). Thus, by specifying parameters alpha and T a desired referent power-law distribution can be selected. However, real CAGE datasets deviate from the power-law in the areas of very low and very high number of tags, so it is advisable to discard these areas before fitting a power-law distribution. fitInRange parameter allows to specify a range of values (lower and upper limit of the number of CAGE tags) that will be used to fit a power-law. Plotting reverse cumulatives using plotReverseCumulatives function can help in choosing the best range of values. After fitting a power-law distribution to each CAGE dataset individually, all datasets are normalized to a referent distribution specified by alpha and T. When T = 10^6, normalized values are expressed as tags per million (tpm).

Value

The slot normalizedTpmMatrix of the provided CAGEexp object will be occupied by normalized CAGE signal values per CTSS across all experiments, or with the raw tag counts (in case method = "none").

Author(s)

Vanja Haberle

References

Balwierz et al. (2009) Methods for analyzing deep sequencing expression data: constructing the human and mouse promoterome with deepCAGE data, Genome Biology 10(7):R79.

Examples

ce1 <- normalizeTagCount(exampleCAGEexp, method = "simpleTpm")
ce2 <- normalizeTagCount(exampleCAGEexp, method = "powerLaw")

charles-plessy/CAGEr documentation built on Oct. 27, 2024, 10:11 p.m.

charles-plessy/CAGEr index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

charles-plessy/CAGEr
Analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining

normalizeTagCount: Normalizing raw CAGE tag count
In charles-plessy/CAGEr: Analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining

Normalizing raw CAGE tag count

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to normalizeTagCount in charles-plessy/CAGEr...

R Package Documentation

Browse R Packages

We want your feedback!

charles-plessy/CAGEr Analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining

normalizeTagCount: Normalizing raw CAGE tag count In charles-plessy/CAGEr: Analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining

Normalizing raw CAGE tag count

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to normalizeTagCount in charles-plessy/CAGEr...

R Package Documentation

Browse R Packages

We want your feedback!

charles-plessy/CAGEr
Analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining

normalizeTagCount: Normalizing raw CAGE tag count
In charles-plessy/CAGEr: Analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining