Recent advances in single-cell technologies enable spatial expression profiling at the cell level, making it possible to elucidate spatial changes of cell-specific genomic features. The gene co-expression network is an important feature that encodes the gene-gene marginal dependence structure and allows for the functional annotation of highly connected genes.
The R package CSSN applies a simple and computationally efficient two-step algorithm to recover spatially-varying cell-specific gene co-expression networks for single-cell spatial expression data. Moreover, there is a prediction algorithm for network prediction. CSSN is capable of (a) estimate gene co-expression network for each cell, (b) predict gene co-expression networks for cells that are not detected.
Here we describe the steps taken to process the original CSSN source code, get networks for each cell or user defined cells and make network predictions.
knitr::opts_chunk$set(tidy = FALSE, cache = FALSE, dev = "png", message = FALSE, error = FALSE, warning = TRUE)
Load libraries needed for analysis.
library(CSSN) library(stringr) library(ggplot2) library(pheatmap)
To fit the two-step algorithm, we developed a spatial single-cell spatial expression data matrix and a matrix with cells' cell type and 2-D coordinates information. The two matrix are the data input of CSSNEst function, which aims to obtain gene co-expression networks of cells.
# example_data includes two matrices, X and cell.info. dir <- system.file(package = 'CSSN') #directory for the example data load(paste0(dir,"/example_data.RData")) # X is the single-cell spatial expression data, continuous. dim(X) # cell.info is the matrix contains each cell's cell type information and their 2-D coordinates. dim(cell.info) # Look at the expression data. head(X[,1:5]) # Look at the cell information matrix. head(cell.info) # Cell type number table(cell.info[,1]) # number of cells n <- ncol(X) # number of genes G <- nrow(X) print(c(n, G))
The example data
example_data consist of two data matrices,
X is $10\times 42$ a gene expression matrix, indicating there are 10 genes and 42 cells in our example data. And
cell.info is a $42 \times 3$ matrix storing cell type information and centroid coordinates of each cell. The first column of
cell.info represents cell types, and the second and third columns are Centroid_X and Centroid_Y coordinates. CT represents cell type of cells, and we set 2 cell types in our example data, the number of which are 18 and 24 respectively.
The cells' spatial pattern is of interest to us, and the spatial distribution of different cell types are vital for biological and medical analysis. So we need to have a sight of
#---- set spatial pattern manually---- pal <- c(rgb(221, 160, 221, maxColorValue = 255), rgb(0, 206, 209, maxColorValue = 255)) pal <- setNames(pal, c("1", "2")) #-----Cell's Spatial Pattern------ cell.type <- as.vector(cell.info[,1]) gg <- ggplot(cell.info, aes(x = X, y = Y, col = as.factor(cell.type), shape = as.factor(cell.type))) pl <- gg + geom_point(size = 2.5) + scale_color_manual(values = c(pal, pal)) + theme_bw()+ theme(legend.text=element_text(size=20), axis.title.x=element_text(size=16), axis.title.y=element_text(size=16), axis.text.x = element_text(size = 12,face = "bold"), axis.text.y = element_text(size = 12,face = "bold") ) + labs(x = "X", y = "Y") + guides(color = guide_legend(title = "Cell Type", title.theme = element_text(size = 25), override.aes = list(size = 5) ), shape = guide_legend(title = "Cell Type", title.theme = element_text(size = 25), override.aes = list(size = 5))) pl
Once we have the gene expression data and the cell information matrix, we can apply the two-step algorithm to obatin gene co-expression network through function
CSSNEst. The first argument,
CSSNEst should be a matrix, notice that dataframe is not applicable.
cell.info should be a matrix or dataframe, notice that the first column represents cell type information, the latter two columns stand for coordinates. The third parameter of
CSSNEst is the degree of freedom in the Inverse Wishart prior of each $\Sigma_i$, we recommend each element of
nu to be $2G$, where $G$ is the number of genes. The fourth argument
d is the thresholding parameter between 0 and 1, and we choose the default value 0.1. The
m.info is cell density parameter, we choose the default value 70. The sixth argument
is.all is a bool variable, if TRUE, the
CSSNEst returns gene co-expression networks of all cells, if FALSE, the function returns networks of assigned cells in
indx.cell is the pre-selected cell index used for outputing. And the eighth parameter is
out.corr, a bool variable deciding whether the
CSSNEst function returns gene correlation matrix of cells. We apply the two-step algorithm as follows:
# Two-step algorithm nu <- rep(2*G, n) # Output all cells' gene co-expression networks. Result <- CSSNEst(X, cell.info, nu = nu, d = 0.1, m.info = 70, is.scale = TRUE, is.all = TRUE) # Result is a list of length n, where n is the number of cell. #-----The first cell's estimated gene co-expression networks----- colors_func <- colorRampPalette(c('white', "black")) colors <- colors_func(2) par(mfrow=c(1,2)) pheatmap(Result[], color = colors, legend_breaks = c(0,1), cluster_cols = F, cluster_rows = F, show_rownames = F, show_colnames = F, width = 2.8, height = 2.8 # filename = filename[i] )
The black square in the heatmaps indicates that there exist an edge between the two genes in the cell, and the white square stands for non edge existing. Moverover, if we are interested in only a subset of cells' gene co-expression networks, we can define a index vector of cells. And if the user need the original gene correlation matrix, we can also use
out.corr argument to output matrices of cells. Then we get a list of
CSSNEst, including gene co-expression networks of cells and the original gene correlation matrix.
indx.cell <- c(1,3,7,10) result <- CSSNEst(X, cell.info, nu = nu, d = 0.1, m.info = 70, is.scale = TRUE, is.all = FALSE, indx.cell = indx.cell, output.corr = TRUE) result$`Correlation Matrix`[] result$`Gene Networks`[]
The privilege of our package is predicting or interpolating gene co-expression networks on tissue positions where cells are not captured. Notice that the input of
GN, must be the gene co-expression networks of all cells. The second argument of
cell.info, is the same as that of
CSSNEst. The third argument,
miss.indx is the coordinates of missing cells, the format of which is matrix. The last parameter,
m.info is also the same as above. The function
CSSNPredict returns a list of length
miss.num is the number of missing cells. And each element of the result is a gene co-expression matrix.
# generate coordinates of missing cells set.seed(1) miss.num <- 5 miss.x <- runif(miss.num, min(cell.info[,2]), max(cell.info[,2])) miss.y <- runif(miss.num, min(cell.info[,3]), max(cell.info[,3])) miss.indx <- cbind(miss.x, miss.y) # Apply prediction algorithm pre <- CSSNPredict(Result, cell.info, miss.indx) length(pre) dim(pre[])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.