normalize.svcd: SVCD (Standard-Vector Condition-Decomposition) Normalization

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Normalizes gene expression data following the SVCD algorithm. It also provides the estimated normalization factors and the no-variation features (e.g. genes) detected.

Usage

1
2
3
normalize.svcd(expression.data, expression.condition, restrict.feature = NULL,
  search.h0.feature = TRUE, convergence.threshold = c(0.01, 0.1, 0.01, 1),
  stdvec.graph = NULL, p.value.graph = NULL, verbose = FALSE)

Arguments

expression.data

Numeric matrix with expression data. Rows correspond to features, for example genes. Columns correspond to samples. If rows and/or columns do not have names, they are assigned names with format "feature.[number]" and/or "sample.[number]", respectively.

expression.condition

Character or numeric vector defining the experimental conditions. It can also be a factor. The length of expression.condition must be equal to the number of columns of expression.data, that is, to the number of samples. It can contain NA values, meaning samples to be ignored in the normalization.

restrict.feature

When restrict.feature is NULL, all features can be used for normalizing. Otherwise, restrict.feature is expected to be a character or numeric vector identifying the features, by name or index respectively, that can be used for normalizing.

search.h0.feature

Logical value indicating whether no-variation features should be searched for the final between-condition normalization, thus restricting the set of features used in this normalization. When search.h0.feature is FALSE, the complete set of features used in the within-condition normalizations is also used for normalizing between conditions.

convergence.threshold

Numeric vector with four elements, defining convergence parameters for the algorithm. The format is (single.step, multiple.step, single.step.search.h0.feature, multiple.step.search.h0.feature). Using values different from the default ones is only advisable when the implementation of convergence is understood in detail.

stdvec.graph

When not NULL, it provides a character string with the name of a directory to save graphs displaying the convergence of standard vectors, grouping conditions in sets of three. The directory can be entered as an absolute or relative path, and it is created if it does not exist. To save in the current working directory, use stdvec.graph="." or stdvec.graph="". Generating the graphs of standard vectors requires the package plotrix.

p.value.graph

When not NULL, it provides a character string with the name of a directory to save graphs displaying the distributions of p-values used for the identification of no-variation features. The directory can be entered as an absolute or relative path, and it is created if it does not exist. To save in the current working directory, use p.value.graph="." or p.value.graph="".

verbose

Logical value indicating whether convergence information should be printed to the console.

Details

Only features (e.g. genes) with no missing values in any sample are used for normalizing. First, the samples of each experimental condition are normalized separately. Then, conditions means are normalized while searching for no-variation features, that is, for a subset of features that do not show evidence of being differentially expressed accross conditions. After this, conditions means are actually normalized using only the detected no-variation features. For C experimental conditions, C+2 normalizations are carried out, all of them using Standard-Vector normalization. Finally, the normalization factors obtained in the normalizations 1, 2, …, C, C+2 are combined to obtain the overall normalization factors and normalize the expression data. See reference below for more details.

If expression.condition indicates that all samples correspond to the same experimental condition, then only one Standard-Vector normalization is performed, without searching for no-variation features.

Value

List with the elements

data

Matrix with normalized data.

offset

Vector of detected normalization factors, with one factor per sample.

h0.feature

Vector of detected no-variation features.

within.condition.offset

Normalization factors detected in the within-condition normalizations.

between.condition.offset

Normalization factors detected in the between-condition normalization.

within.condition.convergence

List with convergence information for the within-condition normalizations.

between.condition.convergence

List with convergence information for the between-condition normalization.

h0.feature.convergence

List with convergence information for the detection of no-variation features.

Author(s)

Carlos P. Roca, carlosproca@gmail.com

References

Roca, Gomes, Amorim & Scott-Fordsmand: Variation-preserving normalization unveils blind spots in gene expression profiling. Sci. Rep. 7, 42460; doi:10.1038/srep42460 (2017).

See Also

normalize.mediancd Implements MedianCD normalization.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# no offset
gene.n <- 1000
sample.n <- 9
expr.data <- matrix( rnorm( gene.n * sample.n ), nrow = gene.n )
expr.condition <- rep( c( 1, 2, 3 ), each = 3 )
normalize.result <- normalize.svcd( expr.data, expr.condition )
sd( normalize.result$offset )
length( normalize.result$h0.feature )

## Not run: 
# with offset
gene.n <- 10000
sample.n <- 9
expr.data <- matrix( rnorm( gene.n * sample.n ), nrow = gene.n )
expr.condition <- rep( c( "treatment.a", "treatment.b", "control" ), 
    each = 3 )
offset.added <- rnorm( sample.n )
expr.data <- sweep( expr.data, 2, offset.added, "+" )
normalize.result <- normalize.svcd( expr.data, expr.condition, 
    stdvec.graph = "svcd_stdvec", p.value.graph = "svcd_p_value", 
    verbose = TRUE )
sd( normalize.result$offset - offset.added )
length( normalize.result$h0.feature )

## End(Not run)

carlosproca/cdnormbio documentation built on May 13, 2019, 12:49 p.m.