knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-" )

R package that accompanies our paper 'Comparison of transformations for single-cell RNA-seq data ' (https://www.nature.com/articles/s41592-023-01814-1).

`transformGamPoi`

provides methods to stabilize the variance of single cell count data:

- acosh transformation based on the delta method
- shifted logarithm (log(x + c)) with a pseudo-count c, so that it approximates the acosh transformation
- randomized quantile and Pearson residuals

You can install the current development version of `transformGamPoi`

by typing the following into the *R* console:

# install.packages("devtools") devtools::install_github("const-ae/transformGamPoi")

The installation should only take a few seconds and work across all major operating systems (MacOS, Linux, Windows).

Let's compare the different variance-stabilizing transformations.

We start by loading the `transformGamPoi`

package and setting a seed to make sure the results are reproducible.

library(transformGamPoi) set.seed(1)

We then load some example data, which we subset to 1000 genes and 500 cells

sce <- TENxPBMCData::TENxPBMCData("pbmc4k") sce_red <- sce[sample(which(rowSums2(counts(sce)) > 0), 1000), sample(ncol(sce), 500)]

We calculate the different variance-stabilizing transformations. We can either use the generic `transformGamPoi()`

method and specify the `transformation`

, or we use the low-level functions `acosh_transform()`

, `shifted_log_transform()`

, and `residual_transform()`

which provide more settings. All functions return a matrix, which we can for example insert back into the `SingleCellExperiment`

object:

assay(sce_red, "acosh") <- transformGamPoi(sce_red, transformation = "acosh") assay(sce_red, "shifted_log") <- shifted_log_transform(sce_red, overdispersion = 0.1) # For large datasets, we can also do the processing without # loading the full dataset into memory (on_disk = TRUE) assay(sce_red, "rand_quant") <- residual_transform(sce_red, "randomized_quantile", on_disk = FALSE) assay(sce_red, "pearson") <- residual_transform(sce_red, "pearson", clipping = TRUE, on_disk = FALSE)

Finally, we compare the variance of the genes after transformation using a scatter plot

par(pch = 20, cex = 1.15) mus <- rowMeans2(counts(sce_red)) plot(mus, rowVars(assay(sce_red, "acosh")), log = "x", col = "#1b9e77aa", cex = 0.6, xlab = "Log Gene Means", ylab = "Variance after transformation") points(mus, rowVars(assay(sce_red, "shifted_log")), col = "#d95f02aa", cex = 0.6) points(mus, rowVars(assay(sce_red, "pearson")), col = "#7570b3aa", cex = 0.6) points(mus, rowVars(assay(sce_red, "rand_quant")), col = "#e7298aaa", cex = 0.6) legend("topleft", legend = c("acosh", "shifted log", "Pearson Resid.", "Rand. Quantile Resid."), col = c("#1b9e77", "#d95f02", "#7570b3", "#e7298a"), pch = 16)

There are a number of preprocessing methods and packages out there. Of particular interests are

- sctransform by Christoph Hafemeister and the Satija lab. The original developers of the Pearson residual variance-stabilizing transformation approach for single cell data.
- scuttle::logNormCounts() by Aaron Lun. This is an alternative to the
`shifted_log_transform()`

and plays nicely together with the Bioconductor universe. For more information, I highly recommend to take a look at the normalization section of the OSCA book. - Sanity by Jérémie Breda
*et al.*. This method is not directly concerned with variance stabilization, but still provides an interesting approach for single cell data preprocessing.

```
sessionInfo()
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.