samesum_log2_norm: Normalized log2 counts using samesum method
In MonashBioinformaticsPlatform/varistran: Variance Stabilizing Transformations appropriate for RNA-Seq expression data

samesum_log2_norm

R Documentation

Normalized log2 counts using samesum method

Description

This is a method of computing log counts while dealing sensibly with zeros and differing library sizes. It should cope well with very sparse data. The input is transformed using log2(x/scale+1) with a different scale for each sample. The scale for each sample is chosen so all of the samples add up to the same total. Finding these scale values is done efficiently using Newton's method.

Usage

samesum_log2_norm(x, scale = 2, sum = NA, tol = 1e-09, verbose = FALSE)

samesum_log2_cpm(x, scale = 2, sum = NA, tol = 1e-09, verbose = FALSE)

Arguments

`x`	A matrix or a sparse matrix. Usually these will be counts, but any matrix of non-negative values could potentially be used. We regard columns as samples.
`scale`	Target scale. Aim to give each sample close to this value of scale. The default value of 2 should be reasonable for count data, but see discussion below.
`sum`	Target sum. Aim to give each sample exactly this sum. `scale` is ignored if this parameter is used. This could be used to provide a transformation that is consistent across datasets.
`tol`	Tolerance when optimizing scale values. 1/scale is increased from zero using Newton's method until the sum is at least `target_sum*(1-tol)`.
`verbose`	If TRUE, output some debugging messages.

Details

You can either specify the argument scale, and it will try to give individual samples scale values close to this value, or directly specify the target sum with sum.

samesum_log2_cpm adds a constant to the result to produce values that can be treated as log2 Counts Per Million (CPM). The result will have the property that mean(colSums(2^lcpm-2^attr(lcpm,"zero"))) equals one million.

Unlike using library size normalization, the samesum method is robust to variations in high abundance features, because it works on the log transformed values. It is also robust to noise from low abundance features because of the +1 in the transformation.

This method has some similarity to Centered Log Ratio (CLR) transformation. Where the CLR ensures each sample adds to zero, the samesum transformation ensures each sample adds to the same non-zero value. Like the CLR, distances between samesum transformed data are a good way to measure sample similarity. Transformed values could be used with PCA or with non-linear dimension reduction methods.

Samesum transformed data can also be used in differential abundance analysis using the limma-trend or limma-vooma methods. As usual, the assumption here is that there is a background of unchanging features that largely determine the scaling of each sample. This is usually true for RNA-Seq differential expression analysis, but seems to be questioned more often in metagenomics.

The default value of 2 for scale should be reasonable for count data, but is not variance stabilizing. limma-trend or limma-vooma will take this into account. For visualization and exploratory methods you could try a larger scale value, which will damp down variation in low abundance rows and provide better variance stabilization. The scale parameter acts similarly to a pseudocount in log transformation methods with a pseudocount parameter (such as edgeR's cpm function).

When different samples have different library sizes, some of them may be given a scale much smaller than the target scale. When this happens, these samples may appear as outliers in data visualization, purely as an artifact of the transformation. A warning is produced if any samples are given a scale less than 0.5. Consider excluding these samples from your analysis. If all the samples need to be used, you can increase the target scale to avoid this problem. This will allow you to compare all samples, but at the cost of losing some ability to resolve fine differences between samples.

Value

A matrix or a sparse matrix, depending on the input.

The return value will have some attributes accessible using attr().

"samples" is a data frame with per-sample information. The most important value for the transformation is the "scale" column. There is also "true_library_size", which is the sum of the input column, and "adjusted_library_size" if you want to use this method just as a library size adjustment method.
"zero" is the value that zero is transformed to. This will be zero for samesum_log2_norm.
"sum" is the value each column of the output sums to (but not including the CPM adjustment if using samesum_log2_cpm).

Examples


set.seed(563)
x <- matrix(rpois(15, 10), ncol=3)

samesum_log2_norm(x)

samesum_log2_cpm(x)

MonashBioinformaticsPlatform/varistran documentation built on June 9, 2025, 8:36 a.m.

MonashBioinformaticsPlatform/varistran index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MonashBioinformaticsPlatform/varistran
Variance Stabilizing Transformations appropriate for RNA-Seq expression data

samesum_log2_norm: Normalized log2 counts using samesum method
In MonashBioinformaticsPlatform/varistran: Variance Stabilizing Transformations appropriate for RNA-Seq expression data

Normalized log2 counts using samesum method

Description

Usage

Arguments

Details

Value

Examples

Related to samesum_log2_norm in MonashBioinformaticsPlatform/varistran...

R Package Documentation

Browse R Packages

We want your feedback!

MonashBioinformaticsPlatform/varistran Variance Stabilizing Transformations appropriate for RNA-Seq expression data

samesum_log2_norm: Normalized log2 counts using samesum method In MonashBioinformaticsPlatform/varistran: Variance Stabilizing Transformations appropriate for RNA-Seq expression data

Normalized log2 counts using samesum method

Description

Usage

Arguments

Details

Value

Examples

Related to samesum_log2_norm in MonashBioinformaticsPlatform/varistran...

R Package Documentation

Browse R Packages

We want your feedback!

MonashBioinformaticsPlatform/varistran
Variance Stabilizing Transformations appropriate for RNA-Seq expression data

samesum_log2_norm: Normalized log2 counts using samesum method
In MonashBioinformaticsPlatform/varistran: Variance Stabilizing Transformations appropriate for RNA-Seq expression data