Introduction to scLink
In scLink: Inferring Functional Gene Co-Expression Networks from Single Cell Data

A system-level understanding of the regulation and coordination mechanisms of gene expression is essential to understanding the complexity of biological processes in health and disease. With the rapid development of single-cell RNA sequencing technologies, it is now possible to investigate gene interactions in a cell-type-specific manner. Here we introduce the scLink package, which uses statistical network modeling to understand the co-expression relationships among genes and to construct sparse gene co-expression networks from single-cell gene expression data.

Here we demonstrate the functionality of scLink using the example data stored in the package.

library(scLink)
count = readRDS(system.file("extdata", "example.rds", package = "scLink"))
genes = readRDS(system.file("extdata", "genes.rds", package = "scLink"))

The example raw count matrix count has 793 rows representing different cells and 23,341 columns representing different genes. genes is a character vector of 500 genes of interest.

`sclink_norm`

We use the function sclink_norm to process single cell read count for application of the sclink method. The code below will normalize the read count matrix with a library size of $10^6$ and only keep the 500 genes in genes for downstream analysis. Note that the normalized count matrix count.norm is on the $\log_{10}$ scale.

count.norm = sclink_norm(count, scale.factor = 1e6, filter.genes = FALSE, gene.names = genes)

If users do not have a particular gene list for network inference, they can set filter.genes=TRUE to filter for the top $n$ genes with largest average expression values. For example:

count.norm = sclink_norm(count, scale.factor = 1e6, filter.genes = TRUE, n = 500)

`sclink_net`

After the pre-processing step, we use the function sclink_net to calculate the robust correlation matrix and identifed sparse co-expression network of scLink. expr is the normalized count matrix output by sclink_norm or supplied by the users. lda is the candidate regularization parameters used in scLink's graphical model. The users can set ncores to take advantage of parallel computation.

networks = sclink_net(expr = count.norm, ncores = 1, lda = seq(0.5, 0.1, -0.05))

sclink_net returns a list of results. The scLink's robust correlation matrix can be retrieved from the cor element:

networks$cor[1:3,1:3]

The gene co-expression networks and summary statistics can be retrieved from the summary element, which is a list with the same length as lda: each element corresponds to one regularization parameter.

net1 = networks$summary[[1]]
names(net1)
### adjacency matrix
net1$adj[1:3,1:3]
### concentration matrix
net1$Sigma[1:3,1:3]
### BIC 
net1$bic
### number of edges
net1$nedge
### regularization parameter lambda
net1$lambda

`sclink_cor`

Since it is very difficult to infer co-expression relationships for lowly expressed genes in single-cell data, we suggest the filtering step as used in sclink_norm to select genes. This also reduces the computational burden. However, if the users would like to infer gene networks for a large gene list (e.g., $> 5000$ genes), we suggest that the users first use sclink_cor to investigate the correlation structures among these genes.

corr = sclink_cor(expr = count.norm, ncores = 1)

If the correlation matrix suggests obvious gene modules, then the users can apply sclink_net separately on these modules to reduce computation time and increase overall accuracy.