Calculate normalization factors

Description

This function calculates normalization factors using a specified multi-step normalization method from a TCC-class object. The procedure can generally be described as the STEP1-(STEP2-STEP3)n pipeline.

Usage

1
2
3
4
## S4 method for signature 'TCC'
calcNormFactors(tcc, norm.method = NULL, test.method = NULL,
                iteration = TRUE,  FDR = NULL, floorPDEG = NULL, 
                increment = FALSE, ...)

Arguments

tcc

TCC-class object.

norm.method

character specifying a normalization method used in both the STEP1 and STEP3. Possible values are "tmm" for the TMM normalization method implemented in the edgeR package, "edger" (same as "tmm"), "deseq2" and "deseq" for the method implemented in the DESeq package. The default is "tmm" when analyzing the count data with multiple replicates (i.e., min(table(tcc$group[, 1])) > 1) and "deseq" when analyzing the count data without replicates
(i.e., min(table(tcc$group[, 1])) == 1).

test.method

character specifying a method for identifying differentially expressed genes (DEGs) used in STEP2: one of "edger", "deseq", "deseq2", "bayseq", "samseq", "voom" and "wad". See the "Details" filed in estimateDE for detail. The default is "edger" when analyzing the count data with multiple replicates (i.e., min(table(tcc$group[, 1])) > 1), and "deseq" (2 group) and "deseq2" (more than 2 group) when analyzing the count data without replicates (i.e., min(table(tcc$group[, 1])) == 1.)

iteration

logical or numeric value specifying the number of iteration (n) in the proposed normalization pipeline: the STEP1-(STEP2-STEP3)n pipeline. If FALSE or 0 is specified, the normalization pipeline is performed only by the method in STEP1. If TRUE or 1 is specified, the three-step normalization pipeline is performed. Integers higher than 1 indicate the number of iteration in the pipeline.

FDR

numeric value (between 0 and 1) specifying the threshold for determining potential DEGs after STEP2.

floorPDEG

numeric value (between 0 and 1) specifying the minimum value to be eliminated as potential DEGs before performing STEP3.

increment

logical value. if increment = TRUE, the DEGES pipeline will perform again from the current iterated result.

...

arguments to identify potential DEGs at STEP2. See the "Arguments" field in estimateDE for details.

Details

The calcNormFactors function is the main function in the TCC package. Since this pipeline employs the DEG identification method at STEP2, our multi-step strategy can eliminate the negative effect of potential DEGs before the second normalization at STEP3. To fully utilize the DEG elimination strategy (DEGES), we strongly recommend not to use iteration = 0 or iteration = FALSE. This function internally calls functions implemented in other R packages according to the specified value.

  • norm.method = "tmm"
    The calcNormFactors function implemented in edgeR is used for obtaining the TMM normalization factors at both STEP1 and STEP3.

  • norm.method = "deseq2"
    The estimateSizeFactors function implemented in DESeq2 is used for obetaining the size factors at both STEP1 and STEP3. The size factors are internally converted to normalization factors that are comparable to the TMM normalization factors.

  • norm.method = "deseq"
    The estimateSizeFactors function implemented in DESeq is used for obetaining the size factors at both STEP1 and STEP3. The size factors are internally converted to normalization factors that are comparable to the TMM normalization factors.

Value

After performing the calcNormFactors function, the calculated normalization factors are populated in the norm.factors field (i.e., tcc$norm.factors). Parameters used for DEGES normalization (e.g., potential DEGs identified in STEP2, execution times for the identification, etc.) are stored in the DEGES field (i.e., tcc$DEGES) as follows:

iteration

the iteration number n for the STEP1 - (STEP2 - STEP3)_{n} pipeline.

pipeline

the DEGES normalization pipeline.

threshold

it stores (i) the type of threshold (threshold$type), (ii) the threshold value (threshold$input), and (iii) the percentage of potential DEGs actually used (threshold$PDEG). These values depend on whether the percentage of DEGs identified in STEP2 is higher or lower to the value indicated by floorPDEG. Consider, for example, the execution of calcNormFactors function with "FDR = 0.1 and floorPDEG = 0.05". If the percentage of DEGs identified in STEP2 satisfying FDR = 0.1 was 0.14 (i.e., higher than the floorPDEG of 0.05), the values in the threshold fields will be threshold$type = "FDR", threshold$input = 0.1, and threshold$PDEG = 0.14. If the percentage (= 0.03) was lower than the predefined floorPDEG value of 0.05, the values in the threshold fields will be threshold$type = "floorPDEG", threshold$input = 0.05, and threshold$PDEG = 0.05.

potDEG

numeric binary vector (0 for non-DEG or 1 for DEG) after the evaluation of the percentage of DEGs identified in STEP2 with the predefined floorPDEG value. If the percentage (e.g., 2%) is lower than the floorPDEG value (e.g., 17%), 17% of elements become 1 as DEG.

prePotDEG

numeric binary vector (0 for non-DEG or 1 for DEG) before the evaluation of the percentage of DEGs identified in STEP2 with the predefined floorPDEG value. Regardless of the floorPDEG value, the percentage of elements with 1 is always the same as that of DEGs identified in STEP2.

execution.time

computation time required for normalization.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
data(hypoData)
group <- c(1, 1, 1, 2, 2, 2)

# Calculating normalization factors using the DEGES/edgeR method 
# (the TMM-edgeR-TMM pipeline).
tcc <- new("TCC", hypoData, group)
tcc <- calcNormFactors(tcc, norm.method = "tmm", test.method = "edger",
                       iteration = 1, FDR = 0.1, floorPDEG = 0.05)
tcc$norm.factors

# Calculating normalization factors using the iterative DEGES/edgeR method 
# (iDEGES/edgeR) with n = 3.
tcc <- new("TCC", hypoData, group)
tcc <- calcNormFactors(tcc, norm.method = "tmm", test.method = "edger",
                       iteration = 3, FDR = 0.1, floorPDEG = 0.05)
tcc$norm.factors

# Calculating normalization factors for simulation data without replicates.
tcc <- simulateReadCounts(replicates = c(1, 1))
tcc <- calcNormFactors(tcc, norm.method = "deseq", test.method = "deseq",
                       iteration = 1, FDR = 0.1, floorPDEG = 0.05)
tcc$norm.factors