normalizeCounts: Count data normalization

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/normalizeCounts.R

Description

Normalize count data to remove systematic technical effects.

Usage

1
2
normalizeCounts(counts, group=rep.int(1,ncol(counts)), method=c("TMM", "cqn"),
                common.disp = FALSE, prior.df=8, annot=NULL, lib.sizes=NULL, verbose=TRUE)

Arguments

counts

numeric data.frame or matrix containing the count data.

group

vector giving the experimental group/condition for each sample/library. This argument is only relevant when method="cqn".

method

specific method to use in order to normalize the input matrix of counts. By default this is set to TMM (Robinson and Oshlack, 2010) using the implementation available in the edgeR package. The other option is cqn (Hansen, Irizarry and Wu, 2012).

common.disp

logical indicating whether a common or tagwise (default) dispersions should be estimated and employed when adjusting counts. This argument is only relevant when method="TMM".

prior.df

argument provided to the call of estimateTagwiseDisp which defines the prior degrees of freedom. It is used in calculating 'prior.n' which, in turn, defines the amount of shrinkage of the estimated tagwise dispersions to the common one. By default prior.df=8 thus assumming no shrinkage toward that common dispersion. This argument is not used if common.disp=TRUE. This argument is only relevant when method="TMM".

annot

matrix or data frame with row names matching at least part of the row names in the counts input matrix, containing feature/tag/gene lengths in bp on its first column, and a second covariate, such as G+C content, on its second column. These two pieces of information are provided to arguments lengths and x when calling cqn. This argument is only relevant when method="TMM".

lib.sizes

vector of the total number of reads to be considered per sample/library. If lib.sizes=NULL (default) then these quantities are estimated as the column sums in the input matrix of counts.

verbose

logical indicating whether progress should be reported.

Details

This function encapsulates calls to RNA-seq normalization procedures available in the edgeR and cqn packages in order to try to remove systematic technical effects from raw counts.By default, the TMM method described in Robinson and Oshlack (2010) is employed to calculate normalization factors which are applied to estimate effective library sizes, then common and tagwise (only when the argument common.disp=TRUE) dispersions are calculated (Robinson and Smyth, Bioinformatics 2007) and finally counts are adjusted so that library sizes are approximately equal for the given dispersion values (Robinson and Smyth, Biostatistics 2008).Setting the argument method="cqn", conditional quantile normalization (Hansen, Irizarry and Wu, 2012) is applied which aims at adjusting for tag/feature/gene length and other covariate such as G+C content. This information should be provided through the annot argument. This procedure calculates, for every gene and every sample, an offset to apply to the log2 reads per million (RPM) and the function normalizeCounts() adds this offset to the the log2 RPM values calculated from the input count data matrix, unlogs them and rolls back these normalized RPM values into integer counts. Details on these two normalization procedures are given in the documentation of the edgeR and cqn Bioconductor packages.

Value

A matrix of normalized counts.

Author(s)

J.R. Gonzalez and R. Castelo

References

K.D. Hansen, R.A. Irizarry and Z. Wu. Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics, 2012.

M.D. Robinson and A. Oshlack. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol, 11:R25, 2010.

Robinson MD and Smyth GK (2007). Moderated statistical tests for assessing differences in tag abundance. _Bioinformatics_ 23, 2881-2887

Robinson MD and Smyth GK (2008). Small-sample estimation of negative binomial dispersion, with applications to SAGE data. _Biostatistics_, 9, 321-332

See Also

filterCounts

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Generate a random matrix of counts
counts <- matrix(rPT(n=1000, a=0.5, mu=10, D=5), ncol = 40)

colSums(counts)
counts[1:5, 1:5]

# Normalize counts
normCounts <- normalizeCounts(counts, rep(c(1,2), 20))

colSums(normCounts)
normCounts[1:5, 1:5]

Example output

 [1] 300 299 304 221 202 224 190 194 247 281 255 301 312 241 255 285 218 264 221
[20] 296 275 235 271 328 212 311 217 202 253 218 291 338 311 268 251 221 205 260
[39] 238 240
     [,1] [,2] [,3] [,4] [,5]
[1,]   25    9   29   13   13
[2,]   10   13    9    1    4
[3,]   34    7    4   16    0
[4,]   13   16   15    8   11
[5,]   29   36   20   11    3
Using edgeR-TMM normalization.
Calculating normalization factors with the TMM method.
Estimating common dispersion.
Estimating tagwise dispersions.
Calculating effective library sizes.
Adjusting counts to effective library sizes using tagwise dispersions.
 Sample1  Sample2  Sample3  Sample4  Sample5  Sample6  Sample7  Sample8 
     259      263      296      276      284      245      249      256 
 Sample9 Sample10 Sample11 Sample12 Sample13 Sample14 Sample15 Sample16 
     233      229      256      204      280      218      217      289 
Sample17 Sample18 Sample19 Sample20 Sample21 Sample22 Sample23 Sample24 
     277      238      221      297      220      231      216      287 
Sample25 Sample26 Sample27 Sample28 Sample29 Sample30 Sample31 Sample32 
     295      278      267      250      232      219      254      317 
Sample33 Sample34 Sample35 Sample36 Sample37 Sample38 Sample39 Sample40 
     235      296      258      232      205      238      302      258 
  Sample1 Sample2 Sample3 Sample4 Sample5
1      22       8      28      16      18
2       9      11       9       1       6
3      30       6       4      20       0
4      11      14      14      10      15
5      25      32      19      14       4

tweeDEseq documentation built on Nov. 8, 2020, 5:59 p.m.