A system-level understanding of the regulation and coordination mechanisms of gene expression is essential to understanding the complexity of biological processes in health and disease. With the rapid development of single-cell RNA sequencing technologies, it is now possible to investigate gene interactions in a cell-type-specific manner. Here we introduce the scLink package, which uses statistical network modeling to understand the co-expression relationships among genes and to construct sparse gene co-expression networks from single-cell gene expression data.
Here we demonstrate the functionality of scLink using the example data stored in the package.
library(scLink) count = readRDS(system.file("extdata", "example.rds", package = "scLink")) genes = readRDS(system.file("extdata", "genes.rds", package = "scLink"))
The example raw count matrix count has 793 rows representing different cells and 23,341 columns representing different genes.
genes is a character vector of 500 genes of interest.
sclink_normWe use the function sclink_norm to process single cell read count for application of the sclink method. The code below will normalize the read count matrix with a library size of $10^6$ and only keep the 500 genes in genes for downstream analysis. Note that the normalized count matrix count.norm is on the $\log_{10}$ scale.
count.norm = sclink_norm(count, scale.factor = 1e6, filter.genes = FALSE, gene.names = genes)
If users do not have a particular gene list for network inference, they can set filter.genes=TRUE to filter for the top $n$ genes with largest average expression values. For example:
count.norm = sclink_norm(count, scale.factor = 1e6, filter.genes = TRUE, n = 500)
sclink_netAfter the pre-processing step, we use the function sclink_net to calculate the robust correlation matrix and identifed sparse co-expression network of scLink.
expr is the normalized count matrix output by sclink_norm or supplied by the users. lda is the candidate regularization parameters used in scLink's graphical model. The users can set ncores to take advantage of parallel computation.
networks = sclink_net(expr = count.norm, ncores = 1, lda = seq(0.5, 0.1, -0.05))
sclink_net returns a list of results. The scLink's robust correlation matrix can be retrieved from the cor element:
networks$cor[1:3,1:3]
The gene co-expression networks and summary statistics can be retrieved from the summary element, which is a list with the same length as lda: each element corresponds to one regularization parameter.
net1 = networks$summary[[1]] names(net1) ### adjacency matrix net1$adj[1:3,1:3] ### concentration matrix net1$Sigma[1:3,1:3] ### BIC net1$bic ### number of edges net1$nedge ### regularization parameter lambda net1$lambda
sclink_corSince it is very difficult to infer co-expression relationships for lowly expressed genes in single-cell data, we suggest the filtering step as used in sclink_norm to select genes. This also reduces the computational burden. However, if the users would like to infer gene networks for a large gene list (e.g., $> 5000$ genes), we suggest that the users first use sclink_cor to investigate the correlation structures among these genes.
corr = sclink_cor(expr = count.norm, ncores = 1)
If the correlation matrix suggests obvious gene modules, then the users can apply sclink_net separately on these modules to reduce computation time and increase overall accuracy.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.