knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

1. Installation

1.1 Install cTP-net

1.1.1 Support Python package

First, install the supporting Python package ctpnetpy. See the source code of the package here

pip install cTPnet

If there is problem with PyTorch, refer to pytorch website for more details.

1.1.2 R package

Next, open R and install the R package cTPnet

devtools::install_github("zhouzilu/cTPnet")

1.1.3 Pretrained model

Download the pretrained model from weights.

1.2 Install SAVER-X (Highly recommended)

In addition, if you want to denoise your raw scRNA counts, please follow the SAVER-X installation pipeline. Modified from https://github.com/jingshuw/SAVERX

1.2.1 Support Python package

Install supporting Python package sctransfer.

pip install sctransfer

1.2.2 R package

Install R pacakge.

devtools::install_github("jingshuw/SAVERX")

1.2.3 Pretrained model

Download the pretrained model from weights.

Currently, SAVER-X do not support for super large data sets (test failed for 270,000 cells and 200GB RAM). cTP-net, on the other hand, can predict surface protein abundance relatively accurate without denoising.

2. Questions & issues

If you have any questions or problems when using cTPnet or ctpnetpy, please feel free to open a new issue here. You can also email the maintainers of the corresponding packages --

3. cTP-net analysis pipeline

To accurately impute surface protein abundance from scRNA-seq data, cTP-net employs two steps: (1) denoising of the scRNA-seq count matrix and (2) imputation based on the denoised data through a transcriptome-protein mapping (Figure 1). The initial denoising, by SAVERX, produces more accurate estimates of the RNA transcript relative abundances for each cell. Compared to the raw counts, the denoised relative expression values have significantly improved correlation with their cognate protein measurement.

knitr::include_graphics("https://raw.githubusercontent.com/zhouzilu/cTPnet/master/figure/FIG_pkg.jpg")

Figure 1. (a) Overview of cTP-net analysis pipeline, which learns a mapping from the denoised scRNA-seq data to the relative abundance of surface proteins, capturing multi-gene features that reflect the cellular environment and related processes. (b) For three example proteins, cross-cell scatter and correlation of CITE-seq measured abundances vs. (1) raw RNA count, (2) SAVER-X denoised RNA level, and (3) cTP-net predicted protein abundance.

3.1 Raw counts denoising with SAVER-X

Please refer to SAVER-X package for detailed instruction. As for this vignette, we load a demo data set (17009 genes $\times$ 2000 cells) from Bone Marrow Mononuclear Cell that has been already denoised with SAVER-X.

library(cTPnet)
library(Seurat)
library(reticulate)
# Set python path and virtual environment using reticulate
use_virtualenv("C:/Users/zhouzilu/Documents/test_ctpnet")
# The above line has to be called right after loading reticulate library !
data("cTPnet_demo")
head(demo_data[,1:6])

3.2 Immunophenotype (surface protein) imputation

3.2.1 Seurat v2 pipeline

Let's create a seurat object demo and generate the prediction.

model_file_path="C:/Users/zhouzilu/Documents/cTPnet_weight_24"
data_type='Seurat3'
demo = CreateSeuratObject(demo_data)
demo = cTPnet(demo,data_type,model_file_path)

3.2.2 Following analysis (Modified from Seurat v3.0)

# standard log-normalization
demo <- NormalizeData(demo, display.progress = FALSE)
# choose ~1k variable features
demo <- FindVariableFeatures(demo, do.plot = FALSE)

# standard scaling (no regression)
demo <- ScaleData(demo, display.progress = FALSE)

# Run PCA, select 13 PCs for tSNE visualization and graph-based clustering
demo <- RunPCA(demo, verbose = FALSE)
ElbowPlot(demo, ndims = 25)

demo <- FindNeighbors(demo, dims = 1:25, k.param = 20)
demo <- FindClusters(demo, resolution = 0.8)
demo <- RunTSNE(demo, dims = 1:25, method = "FIt-SNE", max_iter=2000)
DimPlot(demo, label = TRUE, pt.size = 0.5)

3.2.3 Visualize imputed protein levels on RNA clusters

FeaturePlot(demo, features = c(
  "ctpnet_CD34", "ctpnet_CD4", "ctpnet_CD8", 
  "CD34", "CD4", "CD8A",
  "ctpnet_CD16", "ctpnet_CD11c", "ctpnet_CD19", 
  "FCGR3A",'ITGAX','CD19',
  "ctpnet_CD45RA", "ctpnet_CD45RO", "ctpnet_CD27", 
  "PTPRC",'PTPRC','CD27'
     ), min.cutoff = "q25", max.cutoff = "q95", ncol = 3, pt.size=0.5)

3.2.4 Determine the cell markers with helps from imputed proteins

The cell type information can be easily determined by canonical immunophenotypes (i.e. surface protein markers).

# CD4 and CD8 are markers for CD4 T cells and CD8 T cells
# CD45RA and CD45RO are markers for naive T cells and differentiated T cells
# CD19 is marker for B cells
# CD27 is marker for memory B cells
# CD16 is marker for NK cells
# CD34 is marker for developing precursor cells
# CD11c is for tradiational monocyte
new.cluster.ids <- c("Mono","naive CD4/CD8 T", "Mono", "CD8 T", "naive CD4 T", "CD4 T", "naive CD8 T", "Pre.", "B", "NK", "memory B", "Pre.", "Unknown", "CD16+ Mono", "Unknown")
names(new.cluster.ids) <- levels(demo)
demo <- RenameIdents(demo, new.cluster.ids)
DimPlot(demo, label = TRUE, pt.size = 0.5)
RidgePlot(demo, features = c("ctpnet_CD3", "ctpnet_CD11c", "ctpnet_CD8", "ctpnet_CD16"), ncol = 2)

4. Session info

sessionInfo()

5. References

Surface protein imputation from single cell transcriptomes by deep neural networks

Zilu Zhou, Chengzhong Ye, Jingshu Wang, Nancy R. Zhang

bioRxiv 671180; doi: https://doi.org/10.1101/671180



zhouzilu/cTPnet documentation built on July 13, 2022, 2:54 a.m.