knitr::opts_chunk$set(echo = TRUE)

This guide will demonstrate the usage of bindSC to intergrate RNA and ATAC under the gene regulation modeling with expression level regulated by both proximal and distal peaks.

Stage 1: load simulation data

User can refer to sim.R for the details of simulaiton process (It will take 5 mins).

library(bindSC)
dim(sim$X)
dim(sim$Y)
dim(sim$Z0)

Usage of BiCCA

?BiCCA

In this example, dataset X is the gene expression matrix and Y is the peak matrix. Z0 is the simulated gene activity matrix. Seurat, Liger and Harmony use (X, Z0) as input and BiCCA uses (X, Z0, Y) as input.

bindSC also requires the modality specfic clustering ID X.clst,Y.clst to test the integraiton performance

The option tolereance is usually set from 0.01 to 0.05 to avoid unnecessary iteration when sample size is large.

This step will take ~10s

out <- BiCCA(X=sim$X, Z0=sim$Z0, Y=sim$Y, 
      K = 5,
      alpha = 0.1,
      lambda = 0.5,
      X.clst = sim$X_meta$Group,
      Y.clst = sim$Y_meta$Group,  
      num.iteration = 100, 
      temp.path = "./out",
      tolerance = 0.01, 
      save = TRUE, 
      block.size =10000)

Show the iteration index delta

plot(out$delta, xlab="step", ylab="delta")

There are three key ouputs.The vectors u and v are the low-dimension reprentation of two modalites which can be used for downstream joint clustering. The matrix Z_est is the updated gene score matrix after iterations.

summary(out)
dim(out$u)
dim(out$r)
dim(out$Z_est)

Stage 2: Comparsion between bindSC and the traditional CCA

Coembedings from bindSC results

cell_type <- c(sim$X_meta$Group, sim$Y_meta$Group)
data_type <- c(rep("A", dim(out$u)[1]), rep("B",dim(out$r)[1]))
bindsc_umap <- umap(rbind(out$u, out$r))
plt_dt <- data.frame("UMAP1"=bindsc_umap$layout[,1],
                     "UMAP2"=bindsc_umap$layout[,2],
                     "cell_type"= as.factor(cell_type),
                     "data_type"=data_type)
p<-ggscatter(plt_dt, x = "UMAP1", y = "UMAP2",
     color = "cell_type", palette = c("darkseagreen4","lightpink","darkorchid1"), 
     repel = FALSE,size=1, alpha=1,legend.title = "", title="bindSC", font.title=16) + facet_grid(~data_type) 
p

Performance on traditional CCA method (results are from the first iteration).

out <- readRDS(file="./out/iteration1.rds")
cca_umap <- umap(rbind(out$u, out$r))
plt_dt <- data.frame("UMAP1"=cca_umap$layout[,1],
                     "UMAP2"=cca_umap$layout[,2],
                     "cell_type"= as.factor(cell_type),
                     "data_type"=data_type)
# remove some outliers from CCA
p<-ggscatter(plt_dt[(plt_dt$UMAP1>(-20) & plt_dt$UMAP2<20), ], x = "UMAP1", y = "UMAP2",
     color = "cell_type", palette = c("darkseagreen4","lightpink","darkorchid1"), 
     repel = FALSE,size=1, alpha=1,legend.title = "", title="CCA", font.title=16) + facet_grid(~data_type)
p
sessionInfo()


KChen-lab/bindSC documentation built on Sept. 29, 2022, 4:24 a.m.