Description Usage Arguments Details Value Author(s) References Examples
This function implements some methods for between-sample normalization of count data. Although these methods were developed for RNA-seq data, they are also useful for ChIP-seq data normalization after reads were counted within regions or bins. Some methods may also be applied to count data after within-sample normalization (e.g. TPM or RPKM values).
1 2 3 4 |
object |
An object of class |
method |
Normalization method, either "scale", "scaleMedianRegion", "quantile" or "tmm". |
isLogScale |
Indicates whether the raw data in |
trim |
Only used if |
totalCounts |
Only used if |
The following normalization methods are implemented:
scaleSamples are scaled by a factor such that all samples
have the same number N of reads after normalization, where
N is the median number of reads observed accross all samples. If
the argument totalCounts
is missing, the total numbers of
reads are calculated from the given data. Otherwise, the values
in totalCounts
are used.
scaleMedianRegionThe scaling factor s_j for the j-th sample is defined as
s_j = median_i \frac{k_{ij}}{∏_{v=1}^m k_{iv}}.
k_{ij} is the value of region i in sample j. See Anders and Huber (2010) for details.
quantileQuantile normalization is applied to the ChIP-seq values such that each sample has the same cdf after normalization.
tmmThe trimmed mean M-value (tmm) normalization was proposed by Robinson and Oshlack (2010). Here, the logarithm of the scaling factor for sample i is calculated as the trimmed mean of
\log(k_{i,j}/m_{j}).
Variable m_{j} denotes the geometric mean of region j.
Argument trim
is set to 0.3 as default value, so that
the smallest 15% and the largest 15% of the log ratios are
trimmed before calculating the mean.
An object of the same class as the input object
with
the normalized data.
Hans-Ulrich Klein (hklein@broadinstitute.org)
Anders and Huber. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106.\ Robinson and Oshlack. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11(3):R25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | set.seed(1234)
chip <- matrix(c(rpois(20, lambda=10), rpois(20, lambda=20)), nrow=20,
dimnames=list(paste("feature", 1:20, sep=""), c("sample1", "sample2")))
rowRanges <- GRanges(IRanges(start=1:20, end=1:20),
seqnames=c(rep("1", 20)))
names(rowRanges) = rownames(chip)
cSet <- ChIPseqSet(chipVals=chip, rowRanges=rowRanges)
tmmSet <- normalize(cSet, method="tmm", trim=0.3)
mean(log(chipVals(tmmSet))[, 1], trim=0.3) -
mean(log(chipVals(tmmSet))[, 2], trim=0.3) < 0.01
quantSet <- normalize(cSet, method="quantile")
all(quantile(chipVals(quantSet)[, 1]) == quantile(chipVals(quantSet)[, 2]))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.