normalize.mediancd: MedianCD (Median Condition-Decomposition) Normalization

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Normalizes gene expression data following the MedianCD algorithm. It also provides the estimated normalization factors and the no-variation features (e.g. genes) detected.

Usage

1
2
3
4
normalize.mediancd(expression.data, expression.condition,
  normalization.probability = 0.5, restrict.feature = NULL,
  search.h0.feature = TRUE, convergence.threshold = c(0.001, 0.1),
  p.value.graph = NULL, verbose = FALSE)

Arguments

expression.data

Numeric matrix with expression data. Rows correspond to features, for example genes. Columns correspond to samples. If rows and/or columns do not have names, they are assigned names with format "feature.[number]" and/or "sample.[number]", respectively.

expression.condition

Character or numeric vector defining the experimental conditions. It can also be a factor. The length of expression.condition must be equal to the number of columns of expression.data, that is, to the number of samples. It can contain NA values, meaning samples to be ignored in the normalization.

normalization.probability

Probabilty whose quantile is used to normalize at. It must be a numerical value in the interval [0,1].

restrict.feature

When restrict.feature is NULL, all features can be used for normalizing. Otherwise, restrict.feature is expected to be a character or numeric vector identifying the features, by name or index respectively, that can be used for normalizing.

search.h0.feature

Logical value indicating whether no-variation features should be searched for the final between-condition normalization, thus restricting the set of features used in this normalization. When search.h0.feature is FALSE, the complete set of features used in the within-condition normalizations is also used for normalizing between conditions.

convergence.threshold

Numeric vector with two elements, defining convergence parameters for the algorithm. The format is (single.step.search.h0.feature, multiple.step.search.h0.feature). Using values different from the default ones is only advisable when the implementation of convergence is understood in detail.

p.value.graph

When not NULL, it provides a character string with the name of a directory to save graphs displaying the distributions of p-values used for the identification of no-variation features. The directory can be entered as an absolute or relative path, and it is created if it does not exist. To save in the current working directory, use p.value.graph="." or p.value.graph="".

verbose

Logical value indicating whether convergence information should be printed to the console.

Details

Only features (e.g. genes) with no missing values in any sample are used for normalizing. First, the samples of each experimental condition are normalized separately. Then, conditions means are normalized while iteratively searching for no-variation features, that is, for a subset of features that do not show evidence of being differentially expressed accross conditions. After this, conditions means are actually normalized using only the detected no-variation features. For C experimental conditions, C+2 normalizations are carried out, all of them using Median normalization. Finally, the normalization factors obtained in the normalizations 1, 2, …, C, C+2 are combined to obtain the overall normalization factors and normalize the expression data. See reference below for more details.

If expression.condition indicates that all samples correspond to the same experimental condition, then only one Median normalization is performed, without searching for no-variation features.

Value

List with the elements

data

Matrix with normalized data.

offset

Vector of detected normalization factors, with one factor per sample.

h0.feature

Vector of detected no-variation features.

within.condition.offset

Normalization factors detected in the within-condition normalizations.

between.condition.offset

Normalization factors detected in the between-condition normalization.

h0.feature.convergence

List with convergence information for the detection of no-variation features.

Author(s)

Carlos P. Roca, carlosproca@gmail.com

References

Roca, Gomes, Amorim & Scott-Fordsmand: Variation-preserving normalization unveils blind spots in gene expression profiling. Sci. Rep. 7, 42460; doi:10.1038/srep42460 (2017).

See Also

normalize.svcd Implements SVCD normalization.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# no offset
gene.n <- 1000
sample.n <- 9
expr.data <- matrix( rnorm( gene.n * sample.n ), nrow = gene.n )
expr.condition <- rep( c( 1, 2, 3 ), each = 3 )
normalize.result <- normalize.mediancd( expr.data, expr.condition )
sd( normalize.result$offset )
length( normalize.result$h0.feature )

## Not run: 
# with offset
gene.n <- 10000
sample.n <- 9
expr.data <- matrix( rnorm( gene.n * sample.n ), nrow = gene.n )
expr.condition <- rep( c( "treatment.a", "treatment.b", "control" ), 
    each = 3 )
offset.added <- rnorm( sample.n )
expr.data <- sweep( expr.data, 2, offset.added, "+" )
normalize.result <- normalize.mediancd( expr.data, expr.condition, 
    p.value.graph = "mediancd_p_value", verbose = TRUE )
sd( normalize.result$offset - offset.added )
length( normalize.result$h0.feature )

## End(Not run)

carlosproca/cdnormbio documentation built on May 13, 2019, 12:49 p.m.