normalize.svcd: SVCD (Standard-Vector Condition-Decomposition) Normalization
In carlosproca/cdnormbio: Condition-Decomposition Normalization for Biological Applications

Description Usage Arguments Details Value Author(s) References See Also Examples

Normalizes gene expression data following the SVCD algorithm. It also provides the estimated normalization factors and the no-variation features (e.g. genes) detected.

1
2
3

normalize.svcd(expression.data, expression.condition, restrict.feature = NULL,
  search.h0.feature = TRUE, convergence.threshold = c(0.01, 0.1, 0.01, 1),
  stdvec.graph = NULL, p.value.graph = NULL, verbose = FALSE)

`expression.data`	Numeric matrix with expression data. Rows correspond to features, for example genes. Columns correspond to samples. If rows and/or columns do not have names, they are assigned names with format `"feature.[number]"` and/or `"sample.[number]"`, respectively.
`expression.condition`	Character or numeric vector defining the experimental conditions. It can also be a factor. The length of `expression.condition` must be equal to the number of columns of `expression.data`, that is, to the number of samples. It can contain `NA` values, meaning samples to be ignored in the normalization.
`restrict.feature`	When `restrict.feature` is `NULL`, all features can be used for normalizing. Otherwise, `restrict.feature` is expected to be a character or numeric vector identifying the features, by name or index respectively, that can be used for normalizing.
`search.h0.feature`	Logical value indicating whether no-variation features should be searched for the final between-condition normalization, thus restricting the set of features used in this normalization. When `search.h0.feature` is `FALSE`, the complete set of features used in the within-condition normalizations is also used for normalizing between conditions.
`convergence.threshold`	Numeric vector with four elements, defining convergence parameters for the algorithm. The format is (single.step, multiple.step, single.step.search.h0.feature, multiple.step.search.h0.feature). Using values different from the default ones is only advisable when the implementation of convergence is understood in detail.
`stdvec.graph`	When not `NULL`, it provides a character string with the name of a directory to save graphs displaying the convergence of standard vectors, grouping conditions in sets of three. The directory can be entered as an absolute or relative path, and it is created if it does not exist. To save in the current working directory, use `stdvec.graph="."` or `stdvec.graph=""`. Generating the graphs of standard vectors requires the package plotrix.
`p.value.graph`	When not `NULL`, it provides a character string with the name of a directory to save graphs displaying the distributions of p-values used for the identification of no-variation features. The directory can be entered as an absolute or relative path, and it is created if it does not exist. To save in the current working directory, use `p.value.graph="."` or `p.value.graph=""`.
`verbose`	Logical value indicating whether convergence information should be printed to the console.

Only features (e.g. genes) with no missing values in any sample are used for normalizing. First, the samples of each experimental condition are normalized separately. Then, conditions means are normalized while searching for no-variation features, that is, for a subset of features that do not show evidence of being differentially expressed accross conditions. After this, conditions means are actually normalized using only the detected no-variation features. For C experimental conditions, C+2 normalizations are carried out, all of them using Standard-Vector normalization. Finally, the normalization factors obtained in the normalizations 1, 2, …, C, C+2 are combined to obtain the overall normalization factors and normalize the expression data. See reference below for more details.

If expression.condition indicates that all samples correspond to the same experimental condition, then only one Standard-Vector normalization is performed, without searching for no-variation features.

List with the elements

`data`	Matrix with normalized data.
`offset`	Vector of detected normalization factors, with one factor per sample.
`h0.feature`	Vector of detected no-variation features.
`within.condition.offset`	Normalization factors detected in the within-condition normalizations.
`between.condition.offset`	Normalization factors detected in the between-condition normalization.
`within.condition.convergence`	List with convergence information for the within-condition normalizations.
`between.condition.convergence`	List with convergence information for the between-condition normalization.
`h0.feature.convergence`	List with convergence information for the detection of no-variation features.

Carlos P. Roca, carlosproca@gmail.com

Roca, Gomes, Amorim & Scott-Fordsmand: Variation-preserving normalization unveils blind spots in gene expression profiling. Sci. Rep. 7, 42460; doi:10.1038/srep42460 (2017).

normalize.mediancd Implements MedianCD normalization.

# no offset
gene.n <- 1000
sample.n <- 9
expr.data <- matrix( rnorm( gene.n * sample.n ), nrow = gene.n )
expr.condition <- rep( c( 1, 2, 3 ), each = 3 )
normalize.result <- normalize.svcd( expr.data, expr.condition )
sd( normalize.result$offset )
length( normalize.result$h0.feature )

## Not run: 
# with offset
gene.n <- 10000
sample.n <- 9
expr.data <- matrix( rnorm( gene.n * sample.n ), nrow = gene.n )
expr.condition <- rep( c( "treatment.a", "treatment.b", "control" ), 
    each = 3 )
offset.added <- rnorm( sample.n )
expr.data <- sweep( expr.data, 2, offset.added, "+" )
normalize.result <- normalize.svcd( expr.data, expr.condition, 
    stdvec.graph = "svcd_stdvec", p.value.graph = "svcd_p_value", 
    verbose = TRUE )
sd( normalize.result$offset - offset.added )
length( normalize.result$h0.feature )

## End(Not run)