SCTransform: Perform sctransform-based normalization

View source: R/generics.R

SCTransformR Documentation

Perform sctransform-based normalization

Description

Perform a variance‐stabilizing transformation on UMI counts using sctransform::vst (https://github.com/satijalab/sctransform). This replaces the NormalizeDataFindVariableFeaturesScaleData workflow by fitting a regularized negative binomial model per gene and returning:

Usage

SCTransform(object, ...)

## Default S3 method:
SCTransform(
  object,
  cell.attr,
  reference.SCT.model = NULL,
  do.correct.umi = TRUE,
  ncells = 5000,
  residual.features = NULL,
  variable.features.n = 3000,
  variable.features.rv.th = 1.3,
  vars.to.regress = NULL,
  latent.data = NULL,
  do.scale = FALSE,
  do.center = TRUE,
  clip.range = c(-sqrt(x = ncol(x = umi)/30), sqrt(x = ncol(x = umi)/30)),
  vst.flavor = "v2",
  conserve.memory = FALSE,
  return.only.var.genes = TRUE,
  seed.use = 1448145,
  verbose = TRUE,
  ...
)

## S3 method for class 'Assay'
SCTransform(
  object,
  cell.attr,
  reference.SCT.model = NULL,
  do.correct.umi = TRUE,
  ncells = 5000,
  residual.features = NULL,
  variable.features.n = 3000,
  variable.features.rv.th = 1.3,
  vars.to.regress = NULL,
  latent.data = NULL,
  do.scale = FALSE,
  do.center = TRUE,
  clip.range = c(-sqrt(x = ncol(x = object)/30), sqrt(x = ncol(x = object)/30)),
  vst.flavor = "v2",
  conserve.memory = FALSE,
  return.only.var.genes = TRUE,
  seed.use = 1448145,
  verbose = TRUE,
  ...
)

## S3 method for class 'Seurat'
SCTransform(
  object,
  assay = "RNA",
  new.assay.name = "SCT",
  reference.SCT.model = NULL,
  do.correct.umi = TRUE,
  ncells = 5000,
  residual.features = NULL,
  variable.features.n = 3000,
  variable.features.rv.th = 1.3,
  vars.to.regress = NULL,
  do.scale = FALSE,
  do.center = TRUE,
  clip.range = c(-sqrt(x = ncol(x = object[[assay]])/30), sqrt(x = ncol(x =
    object[[assay]])/30)),
  vst.flavor = "v2",
  conserve.memory = FALSE,
  return.only.var.genes = TRUE,
  seed.use = 1448145,
  verbose = TRUE,
  ...
)

## S3 method for class 'IterableMatrix'
SCTransform(
  object,
  cell.attr,
  reference.SCT.model = NULL,
  do.correct.umi = TRUE,
  ncells = 5000,
  residual.features = NULL,
  variable.features.n = 3000,
  variable.features.rv.th = 1.3,
  vars.to.regress = NULL,
  latent.data = NULL,
  do.scale = FALSE,
  do.center = TRUE,
  clip.range = c(-sqrt(x = ncol(x = object)/30), sqrt(x = ncol(x = object)/30)),
  vst.flavor = "v2",
  conserve.memory = FALSE,
  return.only.var.genes = TRUE,
  seed.use = 1448145,
  verbose = TRUE,
  ...
)

Arguments

object

A Seurat object or UMI count matrix.

...

Additional arguments passed to sctransform::vst.

cell.attr

Optional metadata frame (cells × attributes).

reference.SCT.model

Pre‐fitted SCT model (supports only log_umi as latent variable). If provided, computes residuals via that model. When residual.features is NULL, uses the model’s top variable.features.n; otherwise, sets the assay’s variable features to residual.features.

do.correct.umi

Logical; if TRUE (default), stores corrected UMIs in counts.

ncells

Integer; number of cells to subsample when fitting NB regression (default: 5000).

residual.features

Character vector of genes to compute residuals for. Default NULL (all genes). If set, these become the assay’s variable features.

variable.features.n

Integer; when residual.features is NULL, select this many top features by residual variance (default: 3000).

variable.features.rv.th

Numeric; if variable.features.n is NULL, select features exceeding this residual‐variance threshold (default: 1.3).

vars.to.regress

Character vector of metadata columns (e.g. percent.mito) to regress out in a second, non‐regularized model.

latent.data

Numeric matrix (cells × latent covariates) to regress out.

do.scale

Logical; if TRUE, scale residuals to unit variance (default: FALSE).

do.center

Logical; if TRUE, center residuals to mean zero (default: TRUE).

clip.range

Numeric vector of length 2; range to clip residuals (default c(-sqrt(n/30), sqrt(n/30)), with n = number of cells).

vst.flavor

Character; if "v2", uses method = "glmGamPoi_offset", n_cells = 2000, and exclude_poisson = TRUE to fit \theta and intercept only.

conserve.memory

Logical; if TRUE, never builds the full residual matrix (slower but memory‐efficient; forces return.only.var.genes=TRUE; default: FALSE).

return.only.var.genes

Logical; if TRUE (default), scale.data is subset to variable features only.

seed.use

Integer; random seed for reproducibility (default: 1448145). Set to NULL to skip setting a seed.

verbose

Logical; whether to print progress messages (default: TRUE).

assay

Name of assay to pull the count data from; default is 'RNA'

new.assay.name

Name for the new assay containing the normalized data; default is 'SCT'

Details

- A new assay (default name “SCT”), in which: - counts: depth‐corrected UMI counts (as if each cell had uniform sequencing depth; controlled by do.correct.umi). - data: log1p of corrected counts. - scale.data: Pearson residuals from the fitted NB model (optionally centered and/or scaled). - misc: intermediate outputs from sctransform::vst.

When multiple counts layers exist (e.g. after split()), each layer is modeled independently. A consensus variable‐feature set is then defined by ranking features by how often they’re called “variable” across different layers (ties broken by median rank).

By default, sctransform::vst will drop features expressed in fewer than five cells. In the multi-layer case, this can lead to consenus variable-features being excluded from the output's scale.data when a feature is "variable" across many layers but sparsely expressed in at least one.

Value

A Seurat object with a new SCT assay containing: counts (corrected UMIs), data (log1p counts), and scale.data (Pearson residuals), plus misc for intermediate vst outputs.

See Also

vst, get_residuals, correct_counts


Seurat documentation built on June 8, 2025, 12:24 p.m.