In califano-lab/PISCES: Protein Activity Inferecne in Single Cells

Introduction

NOTE: PISCES is transitioning to a new framework that better incorporates the Seurat pipeline to allow greater ease of use. A fully updatd version of this vignette is in the works, with a simplified version posted in the interim. Please excuse any bugs during this time.

The pipeline for Protein Activity Inference in Single Cells (PISCES) is a regulatory-network-based methdology for the analysis of single cell gene expression profiles.

PISCES transforms highly variable and noisy single cell gene expression profiles into robust and reproducible protein activity profiles. PISCES is centered around two key algorithms: the Algorithm for the Reconstruction of Accurate Cellular Networks ARACNe [1]; and the algorithm for Virtual Inference of Protein-activity by Enriched Regulon analysis (VIPER/metaVIPER) [2,3].

Briefly, the ARACNe algorithm is one of the most widely used methods for inferring transcriptional interactions from gene expression data. The VIPER algorithm uses the expression of the ARACNe-inferred regulatory targets of a given protein, such as the targets of a transcription factor (TF), as an accurate reporter of its activity. Typically, PISCES can accurately assess the activity of up to 6000 regulatory proteins from single cell gene expression profiles, significantly increasing the ability to analyze the biological function and relevance of gene products whose mRNAs are undetectable in individual cells (e.g. dropout effect).

Setup

PISCES is implemented in R and requires the following packages:

viper (v1.18.1)
Seurat
igraph
biomaRt
uwot
Matrix
ggplot2
ggpubr
pheatmap
RColorBrewer

You can install these packages, along with PISCES itself, with the following code:

## CRAN packages
install.packages("BiocManager", "devtools",
                 "ggplot2", "ggpubr", "RColorBrewer", "pheatmap",
                 "Seurat", "uwot", "igraph")
## BioConductor packages
BiocManager::install("viper")
BiocManager::install("biomaRt")
## GitHub packages
devtools::install_github("califano-lab/PISCES")

A greatly simplified workflow is presented below. A fully developed vignette for the new PISCES framework can be expected by 1/14/22:

## load data from 10x
exp.mat <- Read10X(paste('PATH_TO_YOUR_DATA/filtered_feature_bc_matrix/', sep = ''))
## create SEURAT object and perform QC; change these thesholds based on the quality of your data
pisces.obj <- CreateSeuratObject(counts = exp.mat, min.cells = 3, min.features = 200)
mt.features <- intersect(mt.genes$hum.symb, rownames(pisces.obj))
pisces.obj[["percent.mt"]] <- PercentageFeatureSet(object = pisces.obj, features = mt.features)
pisces.obj <- SCTransform(pisces.obj, variable.features.n = 5000, vars.to.regress = 'percent.mt')

## cluster to identify cell types if your data is heterogeneous; not necessary for experimentally labeled / sorted data
## the goal is to separate the data into distinct cell types for network generation
# Seurat clustering
pisces.obj <- RunPCA(pisces.obj, verbose = FALSE)
pisces.obj <- RunUMAP(pisces.obj, dims = 1:30, verbose = FALSE)
pisces.obj <- FindNeighbors(pisces.obj, dims = 1:30, verbose = FALSE)
pisces.obj <- FindClusters(pisces.obj, verbose = FALSE)
# PISCES clustering (in gene expression space)
pisces.obj <- CorDist(pisces.obj, pca.feats = 10)
pisces.obj <- LouvainKRange(pisces.obj, kmax = 100)

## if your data has fewer than 10K UMIs / cell, we recommend generating metacells
## the clustering vector can be any of those generated in the previous step; here we use the PISCES clustering
gexp.dist <- pisces.obj@assays$SCT@misc$dist.mat
pisces.metacell.mats <- MetaCells(as.matrix(pisces.obj@assays$RNA@counts),
                                  dist.mat = gexp.dist,
                                  clust.vect = pisces.obj@assays$SCT@misc$pisces.cluster)
dir.create('PATH_TO_YOUR_RESULTS/pisces-clust_aracne-mats/')
for (mcn in names(pisces.metacell.mats)) {
  saveRDS(pisces.metacell.mats[[mcn]], 
          file = paste('PATH_TO_YOUR_RESULTS/pisces-clust_aracne-mats/YOUR_DATA_NAME_pisces-', 
                       mcn, '_k5-arac.rds', sep = ''))
}
## alternatively, you can simply save matrices for each cluster and use those for network generation w/ ARACNe
## OR, if your data are homogeneous, you can use the entire dataset for either metacell generation or as straight input to ARACNe
## see DATA for ARACNe scripts, which must be run on an HPC architecture

## viper and protein-activity based clustering
## net.list here would be a list of networks generated from ARACNe
pisces.obj <- AddPISCESAssay(pisces.obj)
pisces.obj <- CPMTransform(pisces.obj)
pisces.obj <- GESTransform(pisces.obj)
pisces.obj <- PISCESViper(pisces.obj, net.list, sct.ges = FALSE)
pisces.obj <- CorDist(pisces.obj, pca.feats = 10)
# generate dimensionality reductions for visualizatin
data.obj <- MakeUMAP(data.obj)
data.obj <- MakeMDS(data.obj)
# clustering and MR analysis
data.obj <- LouvainKRange(data.obj, kmin = 5, kmax = 100)
data.obj <- MWUMrs(data.obj)
## the master regulators will be added under data.obj@assays$PISCES@misc$mwuMRs
## the object is a list of lists; a pair of lists for each cluster, one for positive MRS and the other for negative, sorted by log p-value

## scatter plots using ggplot
## generate plots
plot.df <- data.frame('MDS1' = data.obj@assays$PISCES@misc$mds[,1],
                      'MDS2' = data.obj@assays$PISCES@misc$mds[,2],
                      'UMAP1' = data.obj@assays$PISCES@misc$umap[,1],
                      'UMAP2' = data.obj@assays$PISCES@misc$umap[,2],
                      'Cluster' = as.factor(data.obj@assays$PISCES@misc$pisces.cluster))
# umap plot
umap.plot <- ggplot(plot.df, aes(UMAP1, UMAP2)) + geom_point(aes(color = Cluster)) +
  ggtitle(paste(tissue.title, ' (UMAP)', sep = '')) + plot.theme
print(umap.plot)
# mds plot
mds.plot <- ggplot(plot.df, aes(MDS1, MDS2)) + geom_point(aes(color = Cluster)) +
  ggtitle(paste(tissue.title, ' (MDS)', sep = '')) + plot.theme
print(mds.plot)