This vignette demonstrates how to run GLM-PCA, which implements a generalized version of PCA for non-normally distributed data, on a Seurat object. If you use this, please cite:
Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model
F. William Townes, Stephanie C. Hicks, Martin J. Aryee & Rafael A. Irizarry
Genome Biology, 2019
doi: https://doi.org/10.1186/s13059-019-1861-6
GitHub: https://github.com/willtownes/glmpca CRAN: https://cran.r-project.org/web/packages/glmpca/index.html
knitr::opts_chunk$set( message = FALSE, warning = FALSE, fig.width = 10 )
Prerequisites to install:
library(Seurat) library(SeuratData) library(SeuratWrappers) library(glmpca) library(scry)
To learn more about this dataset, type ?pbmc3k
InstallData("pbmc3k") data("pbmc3k") # Initial processing to select variable features m <- GetAssayData(pbmc3k, slot = "counts", assay = "RNA") devs <- scry::devianceFeatureSelection(m) dev_ranked_genes <- rownames(pbmc3k)[order(devs, decreasing = TRUE)] topdev <- head(dev_ranked_genes, 2000) # run GLM-PCA on Seurat object. # Uses Poisson model by default # Note that data in the counts slot is used # We choose 10 dimensions for computational efficiency ndims <- 10 pbmc3k <- RunGLMPCA(pbmc3k, features = topdev, L = ndims) pbmc3k <- FindNeighbors(pbmc3k, reduction = 'glmpca', dims = 1:ndims, verbose = FALSE) pbmc3k <- FindClusters(pbmc3k, verbose = FALSE) pbmc3k <- RunUMAP(pbmc3k, reduction = 'glmpca', dims = 1:ndims, verbose = FALSE)
# visualize markers features.plot <- c('CD3D', 'MS4A1', 'CD8A', 'GZMK', 'GZMB', 'FCGR3A') DimPlot(pbmc3k)
Do the learned clusters overlap with the original annotation?
with(pbmc3k[[]], table(seurat_annotations, seurat_clusters))
pbmc3k <- NormalizeData(pbmc3k, verbose = FALSE) FeaturePlot(pbmc3k, features.plot, ncol = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.