NOTE: PISCES is transitioning to a new framework that better incorporates the Seurat pipeline to allow greater ease of use. A fully updatd version of this vignette is in the works, with a simplified version posted in the interim. Please excuse any bugs during this time.
The pipeline for Protein Activity Inference in Single Cells (PISCES) is a regulatory-network-based methdology for the analysis of single cell gene expression profiles.
PISCES transforms highly variable and noisy single cell gene expression profiles into robust and reproducible protein activity profiles. PISCES is centered around two key algorithms: the Algorithm for the Reconstruction of Accurate Cellular Networks ARACNe [1]; and the algorithm for Virtual Inference of Protein-activity by Enriched Regulon analysis (VIPER/metaVIPER) [2,3].
Briefly, the ARACNe algorithm is one of the most widely used methods for inferring transcriptional interactions from gene expression data. The VIPER algorithm uses the expression of the ARACNe-inferred regulatory targets of a given protein, such as the targets of a transcription factor (TF), as an accurate reporter of its activity. Typically, PISCES can accurately assess the activity of up to 6000 regulatory proteins from single cell gene expression profiles, significantly increasing the ability to analyze the biological function and relevance of gene products whose mRNAs are undetectable in individual cells (e.g. dropout effect).
PISCES is implemented in R and requires the following packages:
You can install these packages, along with PISCES itself, with the following code:
## CRAN packages install.packages("BiocManager", "devtools", "ggplot2", "ggpubr", "RColorBrewer", "pheatmap", "Seurat", "uwot", "igraph") ## BioConductor packages BiocManager::install("viper") BiocManager::install("biomaRt") ## GitHub packages devtools::install_github("califano-lab/PISCES")
A greatly simplified workflow is presented below. A fully developed vignette for the new PISCES framework can be expected by 1/14/22:
## load data from 10x exp.mat <- Read10X(paste('PATH_TO_YOUR_DATA/filtered_feature_bc_matrix/', sep = '')) ## create SEURAT object and perform QC; change these thesholds based on the quality of your data pisces.obj <- CreateSeuratObject(counts = exp.mat, min.cells = 3, min.features = 200) mt.features <- intersect(mt.genes$hum.symb, rownames(pisces.obj)) pisces.obj[["percent.mt"]] <- PercentageFeatureSet(object = pisces.obj, features = mt.features) pisces.obj <- SCTransform(pisces.obj, variable.features.n = 5000, vars.to.regress = 'percent.mt') ## cluster to identify cell types if your data is heterogeneous; not necessary for experimentally labeled / sorted data ## the goal is to separate the data into distinct cell types for network generation # Seurat clustering pisces.obj <- RunPCA(pisces.obj, verbose = FALSE) pisces.obj <- RunUMAP(pisces.obj, dims = 1:30, verbose = FALSE) pisces.obj <- FindNeighbors(pisces.obj, dims = 1:30, verbose = FALSE) pisces.obj <- FindClusters(pisces.obj, verbose = FALSE) # PISCES clustering (in gene expression space) pisces.obj <- CorDist(pisces.obj, pca.feats = 10) pisces.obj <- LouvainKRange(pisces.obj, kmax = 100) ## if your data has fewer than 10K UMIs / cell, we recommend generating metacells ## the clustering vector can be any of those generated in the previous step; here we use the PISCES clustering gexp.dist <- pisces.obj@assays$SCT@misc$dist.mat pisces.metacell.mats <- MetaCells(as.matrix(pisces.obj@assays$RNA@counts), dist.mat = gexp.dist, clust.vect = pisces.obj@assays$SCT@misc$pisces.cluster) dir.create('PATH_TO_YOUR_RESULTS/pisces-clust_aracne-mats/') for (mcn in names(pisces.metacell.mats)) { saveRDS(pisces.metacell.mats[[mcn]], file = paste('PATH_TO_YOUR_RESULTS/pisces-clust_aracne-mats/YOUR_DATA_NAME_pisces-', mcn, '_k5-arac.rds', sep = '')) } ## alternatively, you can simply save matrices for each cluster and use those for network generation w/ ARACNe ## OR, if your data are homogeneous, you can use the entire dataset for either metacell generation or as straight input to ARACNe ## see DATA for ARACNe scripts, which must be run on an HPC architecture ## viper and protein-activity based clustering ## net.list here would be a list of networks generated from ARACNe pisces.obj <- AddPISCESAssay(pisces.obj) pisces.obj <- CPMTransform(pisces.obj) pisces.obj <- GESTransform(pisces.obj) pisces.obj <- PISCESViper(pisces.obj, net.list, sct.ges = FALSE) pisces.obj <- CorDist(pisces.obj, pca.feats = 10) # generate dimensionality reductions for visualizatin data.obj <- MakeUMAP(data.obj) data.obj <- MakeMDS(data.obj) # clustering and MR analysis data.obj <- LouvainKRange(data.obj, kmin = 5, kmax = 100) data.obj <- MWUMrs(data.obj) ## the master regulators will be added under data.obj@assays$PISCES@misc$mwuMRs ## the object is a list of lists; a pair of lists for each cluster, one for positive MRS and the other for negative, sorted by log p-value ## scatter plots using ggplot ## generate plots plot.df <- data.frame('MDS1' = data.obj@assays$PISCES@misc$mds[,1], 'MDS2' = data.obj@assays$PISCES@misc$mds[,2], 'UMAP1' = data.obj@assays$PISCES@misc$umap[,1], 'UMAP2' = data.obj@assays$PISCES@misc$umap[,2], 'Cluster' = as.factor(data.obj@assays$PISCES@misc$pisces.cluster)) # umap plot umap.plot <- ggplot(plot.df, aes(UMAP1, UMAP2)) + geom_point(aes(color = Cluster)) + ggtitle(paste(tissue.title, ' (UMAP)', sep = '')) + plot.theme print(umap.plot) # mds plot mds.plot <- ggplot(plot.df, aes(MDS1, MDS2)) + geom_point(aes(color = Cluster)) + ggtitle(paste(tissue.title, ' (MDS)', sep = '')) + plot.theme print(mds.plot)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.