SC-MEB"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

This vignette provides an introduction to the R package SC.MEB, where the function SC.MEB implements the model SC-MEB, spatial clustering with hidden Markov random field using empirical Bayes. The package can be installed with the command:

The package can be loaded with the command:

library("SC.MEB")

Fit SC-MEB using simulated data

We first set the basic parameter:

library(mvtnorm)
library(GiRaF)
library(SingleCellExperiment)
set.seed(100)
G <- 4
Bet <- 1
KK <- 5
p <- 15
mu <- matrix(c( c(-6, rep(-1.5, 14)),
               rep(0, 15),
               c(6, rep(1.5, 14)),
               c(rep(-1.5, 7), rep(1.5, 7), 6),
               c(rep(1.5, 7), rep(-1.5, 7), -6)), ncol = KK)
height <- 50
width <- 50
n <- height * width # # of cell in each indviduals

Then, we generate the true clustering label, 15-dimensional PCA and position of each spot.

  X <- sampler.mrf(iter = n, sampler = "Gibbs", h = height, w = width, ncolors = KK, 
                   nei = G, param = Bet,initialise = FALSE, view = TRUE)
  x <- c(X) + 1
  y <- matrix(0, nrow = n, ncol = p)

  for(i in 1:n) { # cell
    mu_i <- mu[, x[i]]
    Sigma_i <- ((x[i]==1)*2 + (x[i]==2)*2.5 + (x[i]==3)*3 +
                  (x[i]==4)*3.5 + (x[i]==5)*4)*diag(1, p)*4
    y[i, ] <- rmvnorm(1, mu_i, Sigma_i)
  }

  pos <- cbind(rep(1:height, width), rep(1:height, each=width))

Subsequently, we construct the SingleCellExperiment object based on the above PCA and position.

  # -------------------------------------------------
  # make BayesSpace metadata used in BayesSpace
  counts <- t(y)
  rownames(counts) <- paste0("gene_", seq_len(p))
  colnames(counts) <- paste0("spot_", seq_len(n))

  ## Make array coordinates - filled rectangle
  cdata <- list()
  nrow <- height; ncol <- width
  cdata$row <- rep(seq_len(nrow), each=ncol)
  cdata$col <- rep(seq_len(ncol), nrow)
  cdata <- as.data.frame(do.call(cbind, cdata))
  ## Scale and jitter image coordinates
  #scale.factor <- rnorm(1, 8);  n_spots <- n
  #cdata$imagerow <- scale.factor * cdata$row + rnorm(n_spots)
  #cdata$imagecol <- scale.factor * cdata$col + rnorm(n_spots)
  cdata$imagerow <- cdata$row
  cdata$imagecol <- cdata$col
  ## Make SCE
  ## note: scater::runPCA throws warning on our small sim data, so use prcomp
  sce <- SingleCellExperiment(assays=list(counts=counts), colData=cdata)
  reducedDim(sce, "PCA") <- y
  # sce$spatial.cluster <- floor(runif(ncol(sce), 1, 3))

  metadata(sce)$BayesSpace.data <- list()
  metadata(sce)$BayesSpace.data$platform <- "ST"
  metadata(sce)$BayesSpace.data$is.enhanced <- FALSE

Here, we set the basic paramters for our function SC.MEB

singlece = sce
d = 15
K = 4:6
bet = seq(0,5,1)
platform = "ST"
maxIter_ICM = 10
maxIter = 50

Here, we briefly explain these parameters. 'singlece' is a SingleCellExperiment object containing PCA and position informatin. 'd' is a integer specifying the dimension of PCA. The default is 15. 'K' is an integer vector specifying the numbers of mixture components (clusters) for which the BIC is to be calculated. The default is K = 2:9. 'platform' is the name of spatial transcriptomic platform. Specify 'Visium' for hex lattice geometry or 'ST' for square lattice geometry. Specifying this parameter is optional as this information is included in their metadata. 'bet' is a numeric vector specifying the smoothness of Random Markov Field. The default is seq(0,5,0.2). 'maxIter_ICM' is the maximum iteration of ICM algorithm. The default is 10. 'maxIter' is the maximum iteration of EM algorithm. The default is 50.

Finally, we run our model SC-MEB by the function SC.MEB.

out = SC.MEB(sce = singlece, d = d, K=K, bet=bet, platform = platform, 
             maxIter_ICM = maxIter_ICM, maxIter = maxIter)
str(out)

Here, We briefly explain the output of the SC.MEB.

The item 'best_K' is the optimal K we choose according to BIC rule.

The item 'best_beta' is also the optimal beta we choose according to BIC rule.

The item 'best_cluster_label' is the optimal clustering result corresponding to optimal K and optimal beta.

The item 'best_BIC' is the optimal BIC corresponding to optimal K and optimal beta.

The item 'best_ell' is the optimal opposite log-likelihood corresponding to optimal K and optimal beta.

The item 'best_mu' is the optimal mean for each component corresponding to optimal K and optimal beta.

item 'best_sigma' is the optimal variance for each component corresponding to optimal K nd optimal beta.

The item 'best_gam' is the optimal posterior probability matrix corresponding to optimal K and optimal beta.

The item 'cluster_label' is 3-dimensional n$\times$b$\times$q matrix, storing all clustering results for each K and beta. n is the number of cells, b is the length of vector 'bet', q is the length of vector 'K'.

The item 'BIC' contains all BIC value for each K and beta.

The item 'ell' is the opposite log-likelihood for each beta and K.

The item 'mu' is the mean of each component for each beta and K.

The item 'sigma' is the variance of each component for each beta and K.

The item 'gam' is the posterior probability for each beta and K.



Try the SC.MEB package in your browser

Any scripts or data that you put into this service are public.

SC.MEB documentation built on July 16, 2021, 9:06 a.m.