A system-level understanding of the regulation and coordination mechanisms of gene expression is essential to understanding the complexity of biological processes in health and disease. With the rapid development of single-cell RNA sequencing technologies, it is now possible to investigate gene interactions in a cell-type-specific manner. Here we introduce the scLink
package, which uses statistical network modeling to understand the co-expression relationships among genes and to construct sparse gene co-expression networks from single-cell gene expression data.
Here we demonstrate the functionality of scLink
using the example data stored in the package.
library(scLink) count = readRDS(system.file("extdata", "example.rds", package = "scLink")) genes = readRDS(system.file("extdata", "genes.rds", package = "scLink"))
The example raw count matrix count
has 793 rows representing different cells and 23,341 columns representing different genes.
genes
is a character vector of 500 genes of interest.
sclink_norm
We use the function sclink_norm
to process single cell read count for application of the sclink method. The code below will normalize the read count matrix with a library size of $10^6$ and only keep the 500 genes in genes
for downstream analysis. Note that the normalized count matrix count.norm
is on the $\log_{10}$ scale.
count.norm = sclink_norm(count, scale.factor = 1e6, filter.genes = FALSE, gene.names = genes)
If users do not have a particular gene list for network inference, they can set filter.genes=TRUE
to filter for the top $n$ genes with largest average expression values. For example:
count.norm = sclink_norm(count, scale.factor = 1e6, filter.genes = TRUE, n = 500)
sclink_net
After the pre-processing step, we use the function sclink_net
to calculate the robust correlation matrix and identifed sparse co-expression network of scLink.
expr
is the normalized count matrix output by sclink_norm
or supplied by the users. lda
is the candidate regularization parameters used in scLink's graphical model. The users can set ncores
to take advantage of parallel computation.
networks = sclink_net(expr = count.norm, ncores = 1, lda = seq(0.5, 0.1, -0.05))
sclink_net
returns a list of results. The scLink's robust correlation matrix can be retrieved from the cor
element:
networks$cor[1:3,1:3]
The gene co-expression networks and summary statistics can be retrieved from the summary
element, which is a list with the same length as lda
: each element corresponds to one regularization parameter.
net1 = networks$summary[[1]] names(net1) ### adjacency matrix net1$adj[1:3,1:3] ### concentration matrix net1$Sigma[1:3,1:3] ### BIC net1$bic ### number of edges net1$nedge ### regularization parameter lambda net1$lambda
sclink_cor
Since it is very difficult to infer co-expression relationships for lowly expressed genes in single-cell data, we suggest the filtering step as used in sclink_norm
to select genes. This also reduces the computational burden. However, if the users would like to infer gene networks for a large gene list (e.g., $> 5000$ genes), we suggest that the users first use sclink_cor
to investigate the correlation structures among these genes.
corr = sclink_cor(expr = count.norm, ncores = 1)
If the correlation matrix suggests obvious gene modules, then the users can apply sclink_net
separately on these modules to reduce computation time and increase overall accuracy.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.