knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
First, install the supporting Python package ctpnetpy. See the source code of the package here
pip install cTPnet
If there is problem with PyTorch, refer to pytorch website for more details.
Next, open R and install the R package cTPnet
devtools::install_github("zhouzilu/cTPnet")
Download the pretrained model from weights.
In addition, if you want to denoise your raw scRNA counts, please follow the SAVER-X installation pipeline. Modified from https://github.com/jingshuw/SAVERX
Install supporting Python package sctransfer.
pip install sctransfer
Install R pacakge.
devtools::install_github("jingshuw/SAVERX")
Download the pretrained model from weights.
Currently, SAVER-X do not support for super large data sets (test failed for 270,000 cells and 200GB RAM). cTP-net, on the other hand, can predict surface protein abundance relatively accurate without denoising.
If you have any questions or problems when using cTPnet or ctpnetpy, please feel free to open a new issue here. You can also email the maintainers of the corresponding packages --
Zilu Zhou (zhouzilu at pennmedicine dot upenn dot edu)
Genomics and Computational Biology Graduate Group, UPenn
Nancy R. Zhang (nzh at wharton dot upenn dot edu)
Department of Statistics, UPenn
To accurately impute surface protein abundance from scRNA-seq data, cTP-net employs two steps: (1) denoising of the scRNA-seq count matrix and (2) imputation based on the denoised data through a transcriptome-protein mapping (Figure 1). The initial denoising, by SAVERX, produces more accurate estimates of the RNA transcript relative abundances for each cell. Compared to the raw counts, the denoised relative expression values have significantly improved correlation with their cognate protein measurement.
knitr::include_graphics("https://raw.githubusercontent.com/zhouzilu/cTPnet/master/figure/FIG_pkg.jpg")
Figure 1. (a) Overview of cTP-net analysis pipeline, which learns a mapping from the denoised scRNA-seq data to the relative abundance of surface proteins, capturing multi-gene features that reflect the cellular environment and related processes. (b) For three example proteins, cross-cell scatter and correlation of CITE-seq measured abundances vs. (1) raw RNA count, (2) SAVER-X denoised RNA level, and (3) cTP-net predicted protein abundance.
Please refer to SAVER-X package for detailed instruction. As for this vignette, we load a demo data set (17009 genes $\times$ 2000 cells) from Bone Marrow Mononuclear Cell that has been already denoised with SAVER-X.
library(cTPnet) library(Seurat) library(reticulate) # Set python path and virtual environment using reticulate use_virtualenv("C:/Users/zhouzilu/Documents/test_ctpnet") # The above line has to be called right after loading reticulate library ! data("cTPnet_demo") head(demo_data[,1:6])
Let's create a seurat object demo
and generate the prediction.
model_file_path="C:/Users/zhouzilu/Documents/cTPnet_weight_24" data_type='Seurat3' demo = CreateSeuratObject(demo_data) demo = cTPnet(demo,data_type,model_file_path)
# standard log-normalization demo <- NormalizeData(demo, display.progress = FALSE) # choose ~1k variable features demo <- FindVariableFeatures(demo, do.plot = FALSE) # standard scaling (no regression) demo <- ScaleData(demo, display.progress = FALSE) # Run PCA, select 13 PCs for tSNE visualization and graph-based clustering demo <- RunPCA(demo, verbose = FALSE) ElbowPlot(demo, ndims = 25) demo <- FindNeighbors(demo, dims = 1:25, k.param = 20) demo <- FindClusters(demo, resolution = 0.8) demo <- RunTSNE(demo, dims = 1:25, method = "FIt-SNE", max_iter=2000)
DimPlot(demo, label = TRUE, pt.size = 0.5)
FeaturePlot(demo, features = c( "ctpnet_CD34", "ctpnet_CD4", "ctpnet_CD8", "CD34", "CD4", "CD8A", "ctpnet_CD16", "ctpnet_CD11c", "ctpnet_CD19", "FCGR3A",'ITGAX','CD19', "ctpnet_CD45RA", "ctpnet_CD45RO", "ctpnet_CD27", "PTPRC",'PTPRC','CD27' ), min.cutoff = "q25", max.cutoff = "q95", ncol = 3, pt.size=0.5)
The cell type information can be easily determined by canonical immunophenotypes (i.e. surface protein markers).
# CD4 and CD8 are markers for CD4 T cells and CD8 T cells # CD45RA and CD45RO are markers for naive T cells and differentiated T cells # CD19 is marker for B cells # CD27 is marker for memory B cells # CD16 is marker for NK cells # CD34 is marker for developing precursor cells # CD11c is for tradiational monocyte new.cluster.ids <- c("Mono","naive CD4/CD8 T", "Mono", "CD8 T", "naive CD4 T", "CD4 T", "naive CD8 T", "Pre.", "B", "NK", "memory B", "Pre.", "Unknown", "CD16+ Mono", "Unknown") names(new.cluster.ids) <- levels(demo) demo <- RenameIdents(demo, new.cluster.ids) DimPlot(demo, label = TRUE, pt.size = 0.5)
RidgePlot(demo, features = c("ctpnet_CD3", "ctpnet_CD11c", "ctpnet_CD8", "ctpnet_CD16"), ncol = 2)
sessionInfo()
Surface protein imputation from single cell transcriptomes by deep neural networks
Zilu Zhou, Chengzhong Ye, Jingshu Wang, Nancy R. Zhang
bioRxiv 671180; doi: https://doi.org/10.1101/671180
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.