This vignette demonstrates how to run GLM-PCA, which implements a generalized version of PCA for non-normally distributed data, on a Seurat object. If you use this, please cite:

Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model

F. William Townes, Stephanie C. Hicks, Martin J. Aryee & Rafael A. Irizarry

Genome Biology, 2019

doi: https://doi.org/10.1186/s13059-019-1861-6

GitHub: https://github.com/willtownes/glmpca CRAN: https://cran.r-project.org/web/packages/glmpca/index.html

knitr::opts_chunk$set(
  message = FALSE,
  warning = FALSE,
  fig.width = 10
)

Prerequisites to install:

library(Seurat)
library(SeuratData)
library(SeuratWrappers)
library(glmpca)
library(scry)

GLM-PCA on PBMC3k

To learn more about this dataset, type ?pbmc3k

InstallData("pbmc3k")
data("pbmc3k")

# Initial processing to select variable features
m <- GetAssayData(pbmc3k, slot = "counts", assay = "RNA")
devs <- scry::devianceFeatureSelection(m)
dev_ranked_genes <- rownames(pbmc3k)[order(devs, decreasing = TRUE)]
topdev <- head(dev_ranked_genes, 2000)

# run GLM-PCA on Seurat object. 
# Uses Poisson model by default
# Note that data in the counts slot is used
# We choose 10 dimensions for computational efficiency

ndims <- 10
pbmc3k <- RunGLMPCA(pbmc3k, features = topdev, L = ndims)
pbmc3k <- FindNeighbors(pbmc3k, reduction = 'glmpca', dims = 1:ndims, verbose = FALSE)
pbmc3k <- FindClusters(pbmc3k, verbose = FALSE)
pbmc3k <- RunUMAP(pbmc3k, reduction = 'glmpca', dims = 1:ndims, verbose = FALSE)
# visualize markers
features.plot <- c('CD3D', 'MS4A1', 'CD8A', 'GZMK', 'GZMB', 'FCGR3A')
DimPlot(pbmc3k)

Do the learned clusters overlap with the original annotation?

with(pbmc3k[[]], table(seurat_annotations, seurat_clusters))
pbmc3k <- NormalizeData(pbmc3k, verbose = FALSE) 
FeaturePlot(pbmc3k, features.plot, ncol = 2)


satijalab/seurat-wrappers documentation built on April 10, 2024, 3:25 p.m.