Demonstration of workflow
In scregclust: Reconstructing the Regulatory Programs of Target Genes in scRNA-Seq Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The methods below are described in our article

Larsson I & Held F, et al. (2023) Reconstructing the regulatory programs underlying the phenotypic plasticity of neural cancers. Preprint available at bioRxiv; 2023.03.10.532041.

Here we demonstrate the scregclust workflow using the PBMC data from 10X Genomics (available here). This is the same data used in an introductory vignette for the Seurat package. We use Seurat for pre-processing of the data.

# Load required packages
library(Seurat)
library(scregclust)

Download the data

We are focusing here on the filtered feature barcode matrix available as an HDF5 file from the website linked above. The data can be downloaded manually or using R.

However you obtain the data, the code below assumes that the HDF5 file containing it is placed in the same folder as this script with the name pbmc_granulocyte_sorted_3k_filtered_feature_bc_matrix.h5.

url <- paste0(
  "https://cf.10xgenomics.com/samples/cell-arc/2.0.0/",
  "pbmc_granulocyte_sorted_3k/",
  "pbmc_granulocyte_sorted_3k_filtered_feature_bc_matrix.h5"
)
data_path <- file.path(
  tempdir(), "pbmc_granulocyte_sorted_3k_filtered_feature_bc_matrix.h5"
)

download.file(url, data_path, cacheOK = FALSE, mode = "wb")

Load the data in Seurat and preprocess

To perform preprocessing use Seurat to load the data. The file ships with two modalities, "Gene Expression" and "Peaks". We only use the former.

pbmc_data <- Read10X_h5(
  data_path,
  use.names = TRUE,
  unique.features = TRUE
)[["Gene Expression"]]

We create a Seurat object and follow the Seurat vignette to subset the cells and features (genes).

pbmc <- CreateSeuratObject(
  counts = pbmc_data, min.cells = 3, min.features = 200
)

pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT.")
pbmc <- subset(pbmc, subset = percent.mt < 30 & nFeature_RNA < 6000)

SCTransform is used for variance stabilization of the data and Pearson residuals for the 6000 most variable genes are extracted as matrix z.

pbmc <- SCTransform(pbmc, variable.features.n = 6000)

z <- GetAssayData(pbmc, layer = "scale.data")
dim(z)

Use scregclust for clustering target genes into modules

We then use scregclust_format which extracts gene symbols from the expression matrix and determines which genes are considered regulators. By default, transcription factors are used as regulators. Setting mode to "kinase" uses kinases instead of transcription factors. A list of the regulators used internally is returned by get_regulator_list().

out <- scregclust_format(z, mode = "TF")

The output of scregclust_format is a list with three elements.

genesymbols contains the rownames of z
sample_assignment is initialized to be a vector of 1s of length ncol(z) and can be filled with a known sample grouping. Here, we do not use it and just keep it uniform across all cells.
is_regulator is an indicator vector (elements are 0 or 1) corresponding to the entries of genesymbols with 1 marking that the genesymbol is selected as a regulator according to the model of scregclust_format ("TF" or "kinase") and 0 otherwise.

genesymbols <- out$genesymbols
sample_assignment <- out$sample_assignment
is_regulator <- out$is_regulator

Run scregclust with number of initial modules set to 10 and test several penalties. The penalties provided to penalization are used during selection of regulators associated with each module. An increasing penalty implies the selection of fewer regulators. noise_threshold controls the minimum $R^2$ a gene has to achieve across modules. Otherwise the gene is marked as noise. The run can be reproduced with the command below. A pre-fitted model can be downloaded from GitHub for convenience.

# set.seed(8374)
# fit <- scregclust(
#   z, genesymbols, is_regulator, penalization = seq(0.1, 0.5, 0.05),
#   n_modules = 10L, n_cycles = 50L, noise_threshold = 0.05
# )
# saveRDS(fit, file = "datasets/pbmc_scregclust.rds")

url <- paste0(
  "https://github.com/scmethods/scregclust/raw/main/datasets/",
  "pbmc_scregclust.rds"
)
fit_path <- file.path(tempdir(), "pbmc_scregclust.rds")
download.file(url, fit_path)
fit <- readRDS(fit_path)

Analysis of results

Results can be visualized easily using built-in functions. Metrics for helping in choosing an optimal penalty can be plotted by calling plot on the object returned from scregclust.

#| fig.alt: >
#|   Boxplots of predictive R^2 per module (bottom) and
#|   regulator importance (top) over the penalization parameters
#|   specified during model estimation. A decreasing trend can
#|   be seen in R^2 per module and a slow and steady increase in
#|   regulator importance is followed by an explosive increase from
#|   around 0.4 penalization.
plot(fit)

The results for each penalization parameter are placed in a list, results, attached to the fit object. So fit$results[[1]] contains the results of running scregclust with penalization = 0.1. For each penalization parameter, the algorithm might end up finding multiple optimal configurations. Each configuration describes target genes module assignments and which regulators are associated with which modules. The results for each such configuration are contained in the list output. This means that fit$results[[1]]$output[[1]] contains the results for the first final configuration. More than one may be available.

sapply(fit$results, function(r) length(r$output))

In this example, at most two final configurations were found for each penalization parameters.

To plot the regulator network of the first configuration for penalization = 0.1 the function plot_regulator_network can be used.

#| fig.alt: >
#|   Network visualization of modules (colorful circles) and their top
#|   regulators (grey rectangles). Arrows indicate regulation and their
#|   thickness represents regulation strength. Red arrows indicate positive
#|   regulation and blue arrows indicate negative regulation.
plot_regulator_network(fit$results[[1]]$output[[1]])

Any scripts or data that you put into this service are public.

scregclust documentation built on April 4, 2025, 3:03 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

scregclust
Reconstructing the Regulatory Programs of Target Genes in scRNA-Seq Data

Demonstration of workflow
In scregclust: Reconstructing the Regulatory Programs of Target Genes in scRNA-Seq Data

Download the data

Load the data in Seurat and preprocess

Use scregclust for clustering target genes into modules

Analysis of results

Try the scregclust package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

scregclust Reconstructing the Regulatory Programs of Target Genes in scRNA-Seq Data

Demonstration of workflow In scregclust: Reconstructing the Regulatory Programs of Target Genes in scRNA-Seq Data

Download the data

Load the data in Seurat and preprocess

Use scregclust for clustering target genes into modules

Analysis of results

Try the scregclust package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

scregclust
Reconstructing the Regulatory Programs of Target Genes in scRNA-Seq Data

Demonstration of workflow
In scregclust: Reconstructing the Regulatory Programs of Target Genes in scRNA-Seq Data