knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(mc.cores=2)
1 Institut de Recherche en Cancérologie de Montpellier, Inserm, Montpellier, France ; Institut régional du Cancer Montpellier, Montpellier, France ; Université de Montpellier, Montpellier, France
This guide provides an overview of the SingleCellSignalR package, a comprehensive framework to obtain cellular network maps from scRNA-seq data. SingleCellSignalR comes with a complete pipeline integrating existing methods to cluster individual cell transcriptomes and identify cell subpopulations as well as novel cellular network-specific algorithms. More advanced users can substitute their own logic or alternative tools at various stages of data processing. SingleCellSignalR also maps cell subpopulation internal network linked to genes of interest through the integration of regulated KEGG and Reactome pathways together with ligands and receptors involved in inferred cell-cell interactions. The cellular networks can be exported in text files and graphML objects to be further explored with Cytoscape (www.cytoscape.org), yEd (www.yworks.com), or similar software tools.
Independent of the chosen scRNA-seq platform, deep or shallower, data comes as a table of read or unique molecule identifier (UMI) counts, one column per individual cell and one row per gene. Initial processing is required to prepare such data for subsequent analysis and we decided to propose a generic solution for the sake of convenience, though users can easily substitute their own computations. Gene names (HUGO symbols) are provided in the first column of the table.
Each analysis is organized around a working directory (or project folder):
The file containing the read counts should be placed in the working directory.
> file <- "example_data.txt"
Data processing can then start:
> data <- data_prepare(file = file) log-Normalization 14223 genes 400 cells Zero rate = 89.6%
library(SingleCellSignalR) data(example_dataset, package = "SingleCellSignalR") data = example_dataset genes = data$genes rownames(data) = genes data = data[,-1]
The data_prepare() function eliminates non expressed genes before performing read count normalization.
Normalized data are submitted to a clustering algorithm to identify cell subpopulations:
clust <- clustering(data = data, n.cluster = 4, n = 10, method = "simlr",write = FALSE,pdf=FALSE)
clust <- clustering(data = data,n.cluster = 4, n = 10, method = "kmeans",write = FALSE,pdf=FALSE)
We set the method argument to simlr
, which caused the SIMLR() function of the SIMLR package [1] to be used. The SIMLR_Estimate_Number_of_Clusters() function determined the number of clusters, between 2 and n (n=10 above).
Next, differentially expressed genes in one cluster compared to the others are identified using the cluster_analysis() function, which relies on edgeR. A result table is automatically created in the cluster-analysis folder:
clust.ana <- cluster_analysis(data = data, genes = rownames(data), cluster = clust$cluster, write = FALSE)
Once the preliminary steps illustrated above are completed, SingleCellSignalR can be used to generate cellular interaction lists using the cell_signaling() function:
signal <- cell_signaling(data = data, genes = rownames(data), cluster = clust$cluster, write = FALSE)
An intercellular network can also be generated to map the overall ligand/receptor interactions invoking the inter_network() function:
inter.net <- inter_network(data = data, signal = signal, genes = genes, cluster = clust$cluster, write = FALSE)
At this point the intercellular network have been generated and exported in text and graphML formats in the networks folder.
A summary of the interactions between cell clusters can be output in the form of a chord diagram by the visualize_interactions() function:
visualize_interactions(signal = signal)
This function will create a plot in the R plot window.
The details of the interactions between two clusters, for example cluster 1 and 2, can also be shown in the plot window with the visualize_interactions() function. Note that in the example below we ask for the display of two pairs of cell clusters, pair 1 that contains interactions from cluster 1 to 2, and pair 4 from cluster 2 to 1. (names(signal)
returns the cell cluster names in each pair, see function visualize_interactions() details.)
visualize_interactions(signal = signal,show.in=c(1,4))
And these plots can be saved into pdf files in the images folder using the write.in
argument of the visualize_interactions() function.
> visualize_interactions(signal = signal,write.in=c(1,4))
red
red
red
SingleCellSignalR package functions have many arguments parameters that can be changed by the user to fit her needs (see Reference Manual for more details). Furthermore, several handy functions that were not illustrated above are provided to generate additional plots or reports.
data(example_dataset, package = "SingleCellSignalR") data = example_dataset genes = data$genes rownames(data) = genes data = data[,-1]
After running the example in the Quick Start section, the user can define cell clusters after the output of the cell_classifier(). The demo data set is comprised of a subset of the 10x PBMC dataset [3], i.e. immune cells. The t-SNE map calculated with the clustering() function will also be used. For this example we will set the plot.details
argument to TRUE to monitor the choice of the threshold of gene signature scores.
class = cell_classifier(data=data, genes=rownames(data), markers = markers(c("immune")), tsne=clust$`t-SNE`,plot.details=TRUE,write = FALSE)
Let us use the cell clustering obtained with the cell_classifier() function. Although "undefined" cells may be interesting in some cases, here they form a heterogeneous cluster because they represent cells that seem to be in a transition between two states ("T-cells" and "Cytotoxic cells", or "Neutrophils" and "Macrophages", see heatmap above). We discard these cells.
# Define the cluster vector and the cluster names cluster <- class$cluster c.names <- class$c.names # Remove undefined cells data <- data[,cluster!=(max(cluster))] tsne <- clust$`t-SNE`[cluster!=(max(cluster)),] c.names <- c.names[-max(cluster)] cluster <- cluster[cluster!=(max(cluster))]
Then the analysis can be carried on.
clust.ana <- cluster_analysis(data = data, genes = rownames(data), cluster = cluster, c.names = c.names, write = FALSE)
Once the cluster analysis is done, the cell_signaling(), inter_network() functions can be used.
signal <- cell_signaling(data = data, genes = genes, cluster = cluster, c.names = c.names, write = FALSE) inter.net <- inter_network(data = data, signal = signal, genes = genes, cluster = cluster, write = FALSE)
If we take a look at signal[[6]]
(or signal[["B-cells-Macrophages"]]
)
signal[[6]]
We can be interested in genes participating in pathways with a receptor of interest inside a cluster of interest. Let us say ASGR1 in "Macrophages".
intra = intra_network(goi = "ASGR1",data = data,genes = rownames(data),cluster = cluster, coi = "Macrophages", c.names = c.names, signal = signal,write=FALSE)
Now, let us take an overview of the signaling between the cell types.
visualize_interactions(signal)
Let us get deeper and look at the signaling between "T-cells" and "B-cells" for example.
visualize_interactions(signal, show.in=c(1,6))
The following command will save these plots in the images folder.
> visualize_interactions(signal, write.in=c(1,6))
For this example we use the scRNAseq dataset from Tirosh et al. [4]. We use only the data from patient 80.
> file <- "patient_80.txt" > data <- data_prepare(file = file) log-Normalization 19452 genes 480 cells Zero rate = 75.5%
Remark: One can notice that the zero rate is lower than in the previous example which reflects the fact that the sequencing is deeper.
We know that this dataset is composed of melanoma cells and their microenvironment, we hence define our markers table using the markers() function.
> my.markers <- markers(category = c("immune", "tme", "melanoma")) > head(my.markers) T-cells B-cells Macrophages Cytotoxic cells DC Mast cells Neutrophils NK cells Treg Endothelial cells CAFs melanoma 1 CD2 CD19 CD163 PRF1 CCL13 TPSB2 FPR1 XCL1 FOXP3 PECAM1 FAP MIA 2 CD3D CD79A CD14 GZMA CD209 TPSAB1 SIGLEC5 XCL2 VWF THY1 TYR 3 CD3E CD79B CSF1R GZMB HSD11B1 CPA3 CSF3R NCR1 CDH5 DCN SLC45A2 4 CD3G BLK C1QC NKG7 MS4A2 FCAR KIR2DL3 CLDN5 COL1A1 CDH19 5 CD8A MS4A1 VSIG4 GZMH HDC FCGR3B KIR3DL1 PLVAP COL1A2 PMEL 6 SIRPG BANK1 C1QA KLRK1 CEACAM3 KIR3DL2 ECSCR COL6A1 SLC24A5
Let us perform the clustering. For this example, we set the method argument to "kmeans" and the n argument to 12.
> clust <- clustering(data = data,n = 12, method = "kmeans") Estimating the number of clusters Estimated number of clusters = 6 6 clusters detected cluster 1 -> 157 cells cluster 2 -> 15 cells cluster 3 -> 137 cells cluster 4 -> 6 cells cluster 5 -> 114 cells cluster 6 -> 51 cells
Now we take advantage of the markers
argument of the cluster_analysis() function using my.markers obtained above with the markers() function.
> clust.ana <- cluster_analysis(data = data, genes = rownames(data), cluster = clust$cluster, markers = my.markers) edgeR differential gene expression (dge) processing: Looking for differentially expressed genes in cluster 1 Looking for differentially expressed genes in cluster 2 Looking for differentially expressed genes in cluster 3 Looking for differentially expressed genes in cluster 4 Looking for differentially expressed genes in cluster 5 Looking for differentially expressed genes in cluster 6
We can see that the clusters 2 and 5 are well defined, they are respectively cancer associated fibroblasts (CAFs) and melanoma cells. The cluster 6 is also clearly composed of endothelial cells. Clusters 1 and 2 are immune cells but the clustering did not succeed in sorting them correctly and cluster 4 counts only 6 cells. Those do not seem to be homogeneous and we decide to remove them.
> data <- data[,clust$cluster!=4] > cluster <- clust$cluster[clust$cluster!=4] > cluster[cluster>4] <- cluster[cluster>4] - 1
Then we can name our clusters manually before pursuing the analysis.
> c.names <- c("Immune 1", "CAFs", "Immune 2", "melanoma", "endothelial") > signal <- cell_signaling(data = data, genes = rownames(data), cluster = cluster, c.names = c.names) Paracrine signaling: Checking for signaling between cell types 78 interactions from Immune 1 to CAFs 30 interactions from Immune 1 to melanoma 11 interactions from Immune 1 to endothelial 67 interactions from CAFs to Immune 1 85 interactions from CAFs to Immune 2 86 interactions from CAFs to melanoma 78 interactions from CAFs to endothelial 84 interactions from Immune 2 to CAFs 55 interactions from Immune 2 to melanoma 44 interactions from Immune 2 to endothelial 12 interactions from melanoma to Immune 1 33 interactions from melanoma to CAFs 19 interactions from melanoma to Immune 2 12 interactions from melanoma to endothelial 53 interactions from endothelial to Immune 1 33 interactions from endothelial to CAFs 69 interactions from endothelial to Immune 2 28 interactions from endothelial to melanoma
Remark: the names of the dge tables in the cluster_analysis folder must be changed according to the cluster names (c.names).
And now visualize!
> visualize_interactions(signal, show.in = c(12,18))
Remark: We observe that in the chord diagrams above, the "specific" interactions were highlighted with a thick black line.
Let us look at one of these specific interactions using the expression_plot_2() function.
> expression_plot_2(data,"NID1","COL13A1",clust$`t-SNE`)
red
red
red
Thank you for reading this guide and for using SingleCellSignalR.
Wang B, Zhu J, Pierson E, Ramazzotti D, Batzoglou S. Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat Methods. 2017;14:414-6.
McCarthy DJ, Chen Y, Smyth GK. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 2012;40:4288-97.
8k PBMCs from a Healthy Donor [Internet]. 2017. Available from: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc8k
Tirosh I, Izar B, Prakadan SM, Wadsworth MH, Treacy D, Trombetta JJ, et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science. 2016;352:189-96.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.