Recent advances in single-cell technologies enable spatial expression profiling at the cell level, making it possible to elucidate spatial changes of cell-specific genomic features. The gene co-expression network is an important feature that encodes the gene-gene marginal dependence structure and allows for the functional annotation of highly connected genes.

The R package *CSSN* applies a simple and computationally efficient two-step algorithm to recover spatially-varying cell-specific gene co-expression networks for single-cell spatial expression data. Moreover, there is a prediction algorithm for network prediction. CSSN is capable of (a) estimate gene co-expression network for each cell, (b) predict gene co-expression networks for cells that are not detected.

Here we describe the steps taken to process the original CSSN source code, get networks for each cell or user defined cells and make network predictions.

knitr::opts_chunk$set(tidy = FALSE, cache = FALSE, dev = "png", message = FALSE, error = FALSE, warning = TRUE)

Load libraries needed for analysis.

library(CSSN) library(stringr) library(ggplot2) library(pheatmap)

To fit the two-step algorithm, we developed a spatial single-cell spatial expression data matrix and a matrix with cells' cell type and 2-D coordinates information. The two matrix are the data input of CSSNEst function, which aims to obtain gene co-expression networks of cells.

# example_data includes two matrices, X and cell.info. dir <- system.file(package = 'CSSN') #directory for the example data load(paste0(dir,"/example_data.RData")) # X is the single-cell spatial expression data, continuous. dim(X) # cell.info is the matrix contains each cell's cell type information and their 2-D coordinates. dim(cell.info) # Look at the expression data. head(X[,1:5]) # Look at the cell information matrix. head(cell.info) # Cell type number table(cell.info[,1]) # number of cells n <- ncol(X) # number of genes G <- nrow(X) print(c(n, G))

The example data `example_data`

consist of two data matrices, `X`

and `cell.info`

. The `X`

is $10\times 42$ a gene expression matrix, indicating there are 10 genes and 42 cells in our example data. And `cell.info`

is a $42 \times 3$ matrix storing cell type information and centroid coordinates of each cell. The first column of `cell.info`

represents cell types, and the second and third columns are *Centroid_X* and *Centroid_Y* coordinates. *CT* represents cell type of cells, and we set 2 cell types in our example data, the number of which are 18 and 24 respectively.

The cells' spatial pattern is of interest to us, and the spatial distribution of different cell types are vital for biological and medical analysis. So we need to have a sight of

#---- set spatial pattern manually---- pal <- c(rgb(221, 160, 221, maxColorValue = 255), rgb(0, 206, 209, maxColorValue = 255)) pal <- setNames(pal, c("1", "2")) #-----Cell's Spatial Pattern------ cell.type <- as.vector(cell.info[,1]) gg <- ggplot(cell.info, aes(x = X, y = Y, col = as.factor(cell.type), shape = as.factor(cell.type))) pl <- gg + geom_point(size = 2.5) + scale_color_manual(values = c(pal[1], pal[2])) + theme_bw()+ theme(legend.text=element_text(size=20), axis.title.x=element_text(size=16), axis.title.y=element_text(size=16), axis.text.x = element_text(size = 12,face = "bold"), axis.text.y = element_text(size = 12,face = "bold") ) + labs(x = "X", y = "Y") + guides(color = guide_legend(title = "Cell Type", title.theme = element_text(size = 25), override.aes = list(size = 5) ), shape = guide_legend(title = "Cell Type", title.theme = element_text(size = 25), override.aes = list(size = 5))) pl

Once we have the gene expression data and the cell information matrix, we can apply the two-step algorithm to obatin gene co-expression network through function `CSSNEst`

. The first argument, `X`

, of `CSSNEst`

should be a matrix, notice that *dataframe* is not applicable. `cell.info`

should be a matrix or dataframe, notice that the **first** column represents cell type information, the latter two columns stand for coordinates. The third parameter of `CSSNEst`

is the degree of freedom in the Inverse Wishart prior of each $\Sigma_i$, we recommend each element of `nu`

to be $2G$, where $G$ is the number of genes. The fourth argument `d`

is the thresholding parameter between 0 and 1, and we choose the default value 0.1. The `m.info`

is cell density parameter, we choose the default value 70. The sixth argument `is.all`

is a bool variable, if TRUE, the `CSSNEst`

returns gene co-expression networks of all cells, if FALSE, the function returns networks of assigned cells in `indx.cell`

, where `indx.cell`

is the pre-selected cell index used for outputing. And the eighth parameter is `out.corr`

, a bool variable deciding whether the `CSSNEst`

function returns gene correlation matrix of cells. We apply the two-step algorithm as follows:

# Two-step algorithm nu <- rep(2*G, n) # Output all cells' gene co-expression networks. Result <- CSSNEst(X, cell.info, nu = nu, d = 0.1, m.info = 70, is.scale = TRUE, is.all = TRUE) # Result is a list of length n, where n is the number of cell. #-----The first cell's estimated gene co-expression networks----- colors_func <- colorRampPalette(c('white', "black")) colors <- colors_func(2) par(mfrow=c(1,2)) pheatmap(Result[[1]], color = colors, legend_breaks = c(0,1), cluster_cols = F, cluster_rows = F, show_rownames = F, show_colnames = F, width = 2.8, height = 2.8 # filename = filename[i] )

The black square in the heatmaps indicates that there exist an edge between the two genes in the cell, and the white square stands for non edge existing. Moverover, if we are interested in only a subset of cells' gene co-expression networks, we can define a index vector of cells. And if the user need the original gene correlation matrix, we can also use `out.corr`

argument to output matrices of cells. Then we get a list of `CSSNEst`

, including gene co-expression networks of cells and the original gene correlation matrix.

indx.cell <- c(1,3,7,10) result <- CSSNEst(X, cell.info, nu = nu, d = 0.1, m.info = 70, is.scale = TRUE, is.all = FALSE, indx.cell = indx.cell, output.corr = TRUE) result$`Correlation Matrix`[[1]] result$`Gene Networks`[[1]]

The privilege of our package is predicting or interpolating gene co-expression networks on tissue positions where cells are not captured. Notice that the input of `CSSNPredict`

, `GN`

, must be the gene co-expression networks of all cells. The second argument of `CSSNPredict`

, `cell.info`

, is the same as that of `CSSNEst`

. The third argument, `miss.indx`

is the coordinates of missing cells, the format of which is matrix. The last parameter, `m.info`

is also the same as above. The function `CSSNPredict`

returns a list of length `miss.num`

, where `miss.num`

is the number of missing cells. And each element of the result is a gene co-expression matrix.

# generate coordinates of missing cells set.seed(1) miss.num <- 5 miss.x <- runif(miss.num, min(cell.info[,2]), max(cell.info[,2])) miss.y <- runif(miss.num, min(cell.info[,3]), max(cell.info[,3])) miss.indx <- cbind(miss.x, miss.y) # Apply prediction algorithm pre <- CSSNPredict(Result, cell.info, miss.indx) length(pre) dim(pre[[1]])

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.