Introduction

Recent advances in single-cell technologies enable spatial expression profiling at the cell level, making it possible to elucidate spatial changes of cell-specific genomic features. The gene co-expression network is an important feature that encodes the gene-gene marginal dependence structure and allows for the functional annotation of highly connected genes.

The R package CSSN applies a simple and computationally efficient two-step algorithm to recover spatially-varying cell-specific gene co-expression networks for single-cell spatial expression data. Moreover, there is a prediction algorithm for network prediction. CSSN is capable of (a) estimate gene co-expression network for each cell, (b) predict gene co-expression networks for cells that are not detected.

Here we describe the steps taken to process the original CSSN source code, get networks for each cell or user defined cells and make network predictions.

knitr::opts_chunk$set(tidy = FALSE,
                      cache = FALSE,
                      dev = "png",
                      message = FALSE, error = FALSE, warning = TRUE)

Pipline

Load libraries needed for analysis.

library(CSSN)
library(stringr)
library(ggplot2)
library(pheatmap)

Data Preparation

To fit the two-step algorithm, we developed a spatial single-cell spatial expression data matrix and a matrix with cells' cell type and 2-D coordinates information. The two matrix are the data input of CSSNEst function, which aims to obtain gene co-expression networks of cells.

# example_data includes two matrices, X and cell.info.
dir <- system.file(package = 'CSSN') #directory for the example data
load(paste0(dir,"/example_data.RData"))
# X is the single-cell spatial expression data, continuous.
dim(X)
# cell.info is the matrix contains each cell's cell type information and their 2-D coordinates.
dim(cell.info)

# Look at the expression data.
head(X[,1:5])
# Look at the cell information matrix.
head(cell.info)

# Cell type number
table(cell.info[,1])
# number of cells
n <- ncol(X)
# number of genes
G <- nrow(X)
print(c(n, G))

The example data example_data consist of two data matrices, X and cell.info. The X is $10\times 42$ a gene expression matrix, indicating there are 10 genes and 42 cells in our example data. And cell.info is a $42 \times 3$ matrix storing cell type information and centroid coordinates of each cell. The first column of cell.info represents cell types, and the second and third columns are Centroid_X and Centroid_Y coordinates. CT represents cell type of cells, and we set 2 cell types in our example data, the number of which are 18 and 24 respectively.

Data Visualization

The cells' spatial pattern is of interest to us, and the spatial distribution of different cell types are vital for biological and medical analysis. So we need to have a sight of

#---- set spatial pattern manually----
pal <- c(rgb(221, 160, 221, maxColorValue = 255),
        rgb(0, 206, 209, maxColorValue = 255))
pal <- setNames(pal, c("1", "2"))

#-----Cell's Spatial Pattern------
cell.type <- as.vector(cell.info[,1])
gg <- ggplot(cell.info, aes(x = X, y = Y, col = as.factor(cell.type), shape = as.factor(cell.type)))
pl <- gg + geom_point(size = 2.5) +
 scale_color_manual(values = c(pal[1], pal[2])) +
 theme_bw()+
 theme(legend.text=element_text(size=20),
       axis.title.x=element_text(size=16),
       axis.title.y=element_text(size=16),
       axis.text.x = element_text(size = 12,face = "bold"),
       axis.text.y = element_text(size = 12,face = "bold")
 ) + labs(x = "X", y = "Y") +
 guides(color = guide_legend(title = "Cell Type",
                             title.theme = element_text(size = 25),
                             override.aes = list(size = 5)
 ),
 shape = guide_legend(title = "Cell Type",
                      title.theme = element_text(size = 25),
                      override.aes = list(size = 5)))
pl

Model Fitting

Once we have the gene expression data and the cell information matrix, we can apply the two-step algorithm to obatin gene co-expression network through function CSSNEst. The first argument, X, of CSSNEst should be a matrix, notice that dataframe is not applicable. cell.info should be a matrix or dataframe, notice that the first column represents cell type information, the latter two columns stand for coordinates. The third parameter of CSSNEst is the degree of freedom in the Inverse Wishart prior of each $\Sigma_i$, we recommend each element of nu to be $2G$, where $G$ is the number of genes. The fourth argument d is the thresholding parameter between 0 and 1, and we choose the default value 0.1. The m.info is cell density parameter, we choose the default value 70. The sixth argument is.all is a bool variable, if TRUE, the CSSNEst returns gene co-expression networks of all cells, if FALSE, the function returns networks of assigned cells in indx.cell, where indx.cell is the pre-selected cell index used for outputing. And the eighth parameter is out.corr, a bool variable deciding whether the CSSNEst function returns gene correlation matrix of cells. We apply the two-step algorithm as follows:

# Two-step algorithm
nu <- rep(2*G, n)
# Output all cells' gene co-expression networks.
Result <- CSSNEst(X, cell.info, nu = nu, d = 0.1, m.info = 70, is.scale = TRUE, is.all = TRUE)
# Result is a list of length n, where n is the number of cell.

#-----The first cell's estimated gene co-expression networks-----
colors_func <- colorRampPalette(c('white', "black"))
colors <- colors_func(2)
par(mfrow=c(1,2))
pheatmap(Result[[1]],
                color = colors,
                legend_breaks = c(0,1),
                cluster_cols = F, cluster_rows = F,
                show_rownames = F, show_colnames = F,
                width = 2.8, height = 2.8
                # filename = filename[i]
)

The black square in the heatmaps indicates that there exist an edge between the two genes in the cell, and the white square stands for non edge existing. Moverover, if we are interested in only a subset of cells' gene co-expression networks, we can define a index vector of cells. And if the user need the original gene correlation matrix, we can also use out.corr argument to output matrices of cells. Then we get a list of CSSNEst, including gene co-expression networks of cells and the original gene correlation matrix.

indx.cell <- c(1,3,7,10)
result <- CSSNEst(X, cell.info, nu = nu, d = 0.1, m.info = 70, is.scale = TRUE, is.all = FALSE, indx.cell = indx.cell, output.corr = TRUE)
result$`Correlation Matrix`[[1]]
result$`Gene Networks`[[1]]

Prediction

The privilege of our package is predicting or interpolating gene co-expression networks on tissue positions where cells are not captured. Notice that the input of CSSNPredict, GN, must be the gene co-expression networks of all cells. The second argument of CSSNPredict, cell.info, is the same as that of CSSNEst. The third argument, miss.indx is the coordinates of missing cells, the format of which is matrix. The last parameter, m.info is also the same as above. The function CSSNPredict returns a list of length miss.num, where miss.num is the number of missing cells. And each element of the result is a gene co-expression matrix.

# generate coordinates of missing cells
set.seed(1)
miss.num <- 5
miss.x <- runif(miss.num, min(cell.info[,2]), max(cell.info[,2]))
miss.y <- runif(miss.num, min(cell.info[,3]), max(cell.info[,3]))
miss.indx <- cbind(miss.x, miss.y)

# Apply prediction algorithm
pre <- CSSNPredict(Result, cell.info, miss.indx)
length(pre)
dim(pre[[1]])


jingeyu/CSSN documentation built on Jan. 9, 2022, 7:35 p.m.