In mariiabilous/SuperCellBM:

An example how to build multiple super-cell-like objects, including 'exact' (Super-cells obtained with the exact coarse-gaining), 'approx' (Super-cells obtained with the aaproximate coarse-graining), 'metacell(.*)' (Metacell build on the same genes as super-cells -- 'metacell_SC_like'; and Metacell build in a default set of genes -- 'metacell_default') for a set of graining levels and random seeds.

# if (!requireNamespace("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")
# 
# BiocManager::install("SingleCellExperiment")
# 
# if (!requireNamespace("remotes")) install.packages("remotes")
# remotes::install_github("GfellerLab/SuperCell")
# remotes::install_github("mariiabilous/SuperCellBM")

library(SingleCellExperiment)
library(SuperCell)
library(SuperCellBM)

Load some default parameters

Such as .gamma.seqfor the set of fraining levels, .seed.seq for the set of random seeds, adata.folder and fig.folder for the folders where to write data and plots. For the full list of the default parameters, see ./examples/config/Tian_config.R.

source("./config/Tcells_config.R")

Flags

Whether to compute super-cell (ToComputeSC) or whether to compute super-cell gene expression (ToComputeSC_GE) or load saved files. Make sure, these file exists :) Flag ToTestPackage is used to run stript in 2 modes: package testing (ToTestPackage == TRUE) or generating super-cell structure and super-cell gene expression for the further analyses (ToTestPackage == FALSE).

ToLoadPreprocData <- T
ToComputeSC       <- F
ToComputeSC_GE    <- F 
ToClusterSinglecell <- F

ToTestPackage     <- F 

filename_suf <- "" # variable to add a suffix to the saved files in case of testing of the package

if(ToTestPackage){
  testing_gamma_seq <- c(1, 10, 100)
  testing_seed_seq <- .seed.seq[1:3]

  warning(paste("The reduced set of graining leveles and seeds will be used, to get real output, turn it ti FALSE"))
  warning(paste("Original set of graining levels is:", paste(.gamma.seq, collapse = ", "), 
                "but used testing set is:", paste(testing_gamma_seq, collapse = ", ")))
  warning(paste("Original set of seeds  is:", paste(.seed.seq, collapse = ", "), 
                "but used testing set is:", paste(testing_seed_seq, collapse = ", ")))

  .gamma.seq <- testing_gamma_seq
  .seed.seq <- testing_seed_seq

  filename_suf = "_testing_package"
} else {
  .seed.seq = .seed.seq[1:5]
}

Load `Tcells` data from 10x Genomics database.

if(!ToLoadPreprocData){
  source("./config/Tcells_load_data.R")
} else {
  sc.GE      <- readRDS(file = file.path(data.folder, "sc_ge.Rds"))
  sc.counts  <- readRDS(file = file.path(data.folder, "sc_counts.Rds"))
  cell.meta  <- readRDS(file = file.path(data.folder, "cell_meta.Rds"))
}

Get and set the main variables, such as single-cell gene expression (sc.GE), single-cell counts (sc.counts), number of single cells (N.c) and total number of genes (N.g). Set matrix column names to cellIDs (cell.ids) and row names to gene names (gene.names).

GT.cell.type                <- cell.meta$GT.cell.type
names(GT.cell.type)         <- rownames(cell.meta)

GT.cell.type.names          <- names(table(cell.meta$GT.cell.type))
GT.cell.type.2.num          <- 1:length(unique(GT.cell.type))
names(GT.cell.type.2.num)   <- GT.cell.type.names
GT.cell.type.num            <- GT.cell.type.2.num[GT.cell.type]
names(GT.cell.type.num)     <- names(GT.cell.type)

cell.ids                    <- colnames(sc.GE)
N.c                         <- length(cell.ids)
gene.names                  <- rownames(sc.GE)
N.g                         <- length(gene.names)

N.clusters                  <- length(unique(GT.cell.type))

mito.genes                  <- grep(pattern = "^MT",     x = gene.names, value = TRUE)
ribo.genes                  <- grep(pattern = "^RP[LS]", x = gene.names, value = TRUE)
mito.ribo.genes             <- c(mito.genes, ribo.genes)
length(mito.ribo.genes)
.genes.omit                  <- mito.ribo.genes


## uncomment this when needed 

# pal.GT                      <- color.tsne.  
# scales::show_col(pal.GT)
# scales::show_col(color.tsne)

Compute Super-cells structure

for the Exact, Aprox (Super-cells obtained with the exact or approximate coarse-graining), Subsampling or Random (random grouping of cells into super-cells).

filename <- paste0('initial', filename_suf)

SC.list <- compute_supercells(
  sc.GE,
  ToComputeSC = ToComputeSC,
  data.folder = data.folder,
  filename = filename,
  gamma.seq = .gamma.seq,
  n.var.genes = .N.var.genes,
  k.knn = .k.knn,
  n.pc = .N.comp,
  approx.N = .approx.N,
  fast.pca = TRUE,
  genes.use = .genes.use, 
  genes.exclude = .genes.omit,
  seed.seq = .seed.seq,
  verbose = TRUE
  )

cat(paste("Super-cell computed for:", paste(names(SC.list), collapse = ", "), 
          "\nat graining levels:", paste(names(SC.list[['Approx']]), collapse = ", "),
          "\nfor seeds:", paste(names(SC.list[['Approx']][[1]]), collapse = ", "), "\n",
          "\nand saved to / loaded from", paste0(filename, ".Rds")))

Compute metacells in two settings:

'metacell_default' - when metacell is computed with the default parameters (from the tutorial), using gene set filtered by MC
'metacell_SC_like' - when metacell is computed with the same default parameters, but at the same set of genes as Super-cells

SC.mc <- compute_supercells_metacells(
  sc.counts = sc.counts, 
  gamma.seq = .gamma.seq,
  SC.list = SC.list,
  proj.name = proj.name,
  ToComputeSC = ToComputeSC, 
  mc.k.knn = 100,
  T_vm_def = 0.08,
  MC.folder = "MC", 
  MC_gene_settings = c('Metacell_default', 'Metacell_SC_like') # do not change
)

Get actual graining levels obtained with Metacell

additional_gamma_seq <- get_actual_gammas_metacell(SC.mc)

cat(paste("Metacells were computed in", length(names(SC.mc)), "settings:", paste(names(SC.mc), collapse = ", "), 
          "\nfor Gammas:", paste(names(SC.mc[[1]]), collapse = ", "), 
          "\nbut actual gammas are:", paste(additional_gamma_seq, collapse = ", ")
))


# manually expand MC because later we will have 2 different setting for MC profile: fp - footpring of MC, av - averaged 
SC.mc.fp <- SC.mc
names(SC.mc.fp) <- sapply(names(SC.mc), FUN = function(x){paste0(x, '_fp')})

SC.mc.av <- SC.mc
names(SC.mc.av) <- sapply(names(SC.mc), FUN = function(x){paste0(x, '_av')})

SC.mc.expanded <- c(SC.mc.fp, SC.mc.av)

names(SC.mc.expanded)
rm(SC.mc.fp, SC.mc.av, SC.mc)

Compute super-cells (Exact, Approx, Subsampling and Random) at the addidional grainig levels obtained with Metacell.

So that we can dirrectly compare the results of Super-cells and Metacells at the same graining levels.

filename <- paste0('additional_gammas', filename_suf)

SC.list <- compute_supercells_additional_gammas(
  SC.list,
  additional_gamma_seq = additional_gamma_seq,
  ToComputeSC = ToComputeSC,
  data.folder = data.folder,
  filename = filename,
  approx.N = .approx.N,
  fast.pca = TRUE
)

cat(paste("Super-cells of methods:", paste(names(SC.list), collapse = ", "), 
      "\nwere computed at aggitional graining levels:", paste(additional_gamma_seq, collapse = ", "), 
      "\nand added to SC.list"
      ))

Concatenate Metacells to the list of Super-cells

SC.list <- c(SC.list, SC.mc.expanded)
rm(SC.mc.expanded)

filename <- paste0("all", filename_suf)
saveRDS(SC.list, file = file.path(data.folder, "SC", paste0(filename, ".Rds")))

cat(paste(
  "Metacell data added to SC.list \nand now it contains:",
  paste(names(SC.list), collapse = ", "),
  "\nSC.list was saved to", file.path(data.folder, "SC", paste0(filename, ".Rds"))
))

SC.list <-  readRDS(file = file.path(data.folder, "SC", paste0(filename, ".Rds")))

START FROM HERE!!!

genes.use <- SC.list$Exact$`10`$`12345`$genes.use
if(length(.N.comp) == 1) {N.comp <- 1:.N.comp} else {N.comp <- .N.comp}

X.for.pca       <- scale(Matrix::t(sc.GE[genes.use,]))
sc.pca          <- irlba::irlba(X.for.pca, nv = max(N.comp, 25))
sc.pca$x        <- sc.pca$u %*% diag(sc.pca$d)

if(ToClusterSinglecell){
  sc.dist <- dist(sc.pca$x[,N.comp])
  sc.clustering.hcl <- hclust(sc.dist, method = "ward.D2")

  saveRDS(sc.clustering.hcl, file = file.path(data.folder, "sc_clustering_hcl.Rds"))
} else {
  sc.clustering.hcl <- readRDS(file.path(data.folder, "sc_clustering_hcl.Rds"))
}

sc.clustering <- cutree(sc.clustering.hcl, k = N.clusters)

Annotate super-cells to the cell lines based on the cell line annotation of single-cell and compute super-cell purity in terms of this annotation

SC.list <- annotate_supercells_to_cluster(
  SC.list = SC.list,
  sc.annotation = sc.clustering, 
  annotation.name = 'sc_clustering'
  )

purity.df <- plot_annotation_purity(
  SC.list, 
  annotation.name = 'sc_clustering', w = 2.5, h = 2.5,
  )

Compute GE for Super-cell data

GE profile for the super-cell data is computede:

for super-cells (Exact, Approx) by averaging gene expression within super-cells,
for Random, also averaging gene expression within super-cells
for the Subsampling, sc.GE matrix is just subsampled,
for Metacells, gene expression is computed in 2 ways: 1) the same as super-cells (averaging gene expression within Metacells) -> metacell_(.*)_av 2) using the default output of Metacell maned footprint -> metacell_(.*)_fp

filename <- paste0("all", filename_suf)

SC.GE.list <- compute_supercells_GE(
  sc.GE = sc.GE, 
  SC.list = SC.list,
  ToComputeSC_GE = ToComputeSC_GE, 
  data.folder = data.folder,
  filename = filename,
  verbose = FALSE
)

cat(paste("Gene expression profile computed for:", paste(names(SC.GE.list), collapse = ", "), 
    "\nat graining levels:", paste(sort(as.numeric(names(SC.GE.list[['Approx']]))), collapse = ", "),
    "\nfor seeds:", paste(names(SC.GE.list[['Approx']][[1]]), collapse = ", "),
    "\nand saved to / loaded from", paste0(filename, ".Rds")
    ))

dim(SC.GE.list$Metacell_SC_like_fp$`100`$`12345`)
dim(SC.list$Metacell_default_fp$`100`$`12345`$mc_info$mc@mc_fp)

dim(SC.GE.list$Exact$`100`$`12345`)
dim(SC.list$Metacell_default_av$`100`$`12345`$mc_info$mc@mc_fp)

Dimensionality reduction of Super-cell data

filename <- paste0("all_PCA", filename_suf)

SC.list <- compute_supercells_PCA(
  SC.list = SC.list,
  SC.GE.list = SC.GE.list,
  N.comp = .N.comp
)

saveRDS(SC.list, file = file.path(data.folder, 'SC', paste0(filename, '.Rds')))

filename <- paste0("all_PCA", filename_suf)

SC.list <- readRDS(file = file.path(data.folder, 'SC', paste0(filename, '.Rds')))

Clustering

SC.list <- compute_supercells_clustering(
  SC.list,
  N.comp = 10,
  N.clusters.seq = c(2:10),
  pca_name = 'SC_PCA'
)

filename <- paste0("all_PCA_clust", filename_suf)

saveRDS(SC.list, file = file.path(data.folder, 'SC', paste0(filename, '.Rds')))

filename <- paste0("all_PCA_clust", filename_suf)

SC.list <- readRDS(file = file.path(data.folder, 'SC', paste0(filename, '.Rds')))

devtools::document()
devtools::install()

detach('package:SuperCellBM', unload = TRUE)
library(SuperCellBM)

alt.clusterings <- compute_alternative_clustering(
  sc.pca = sc.pca$x,
  N.comp = .N.comp,
  N.clusters.seq = .N.clusters.seq,
  hclust_methods = c("average", "mcquitty", "median"),
  seed.seq = .seed.seq,
  verbose = T
)
saveRDS(sc.clustering, file = file.path(data.folder, "sc_alternative_clustering.Rds"))

Consistency between super-cell clustering and cell lines

clust.consistency <- compute_consistency_of_supercell_clustering(
  SC.list = SC.list, 
  sc.annotation = sc.clustering, 
  sc.clustering = sc.clustering,
  sc.alternative.clustering = alt.clusterings
)

clust.consistency_summary <- plot_clustering_consistency(
  clust.consistency, 
  min.value.alt.clustering = 0.,
  error_bars = "extr"
)
clust.consistency_summary

cur.SC <- SC.list$Exact$`100`$`12345`
set.seed(12345)
lay <- igraph::layout.fruchterman.reingold(cur.SC$graph.supercells)

supercell_plot(cur.SC$graph.supercells, group = cur.SC$hclust$`4`, lay = lay)
supercell_plot(cur.SC$graph.supercells, group = cur.SC$sc_clustering, lay = lay)

set.seed(123)
supercell_plot(SC.list.pca$Exact$`20`$`12345`$graph.supercells)

Final

print("Done! Congrats!")

if(ToTestPackage){
  warning(paste("(!) Script was run in a test mode, to get real cell_line super-cell data, run this script with ToTestPackage <- FALSE"))
}

mariiabilous/SuperCellBM documentation built on Jan. 28, 2022, 7:45 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mariiabilous/SuperCellBM

In mariiabilous/SuperCellBM:

Load some default parameters

Flags

Load `Tcells` data from 10x Genomics database.

Compute Super-cells structure

Compute metacells in two settings:

Get actual graining levels obtained with Metacell

Compute super-cells (Exact, Approx, Subsampling and Random) at the addidional grainig levels obtained with Metacell.

Concatenate Metacells to the list of Super-cells

START FROM HERE!!!

Annotate super-cells to the cell lines based on the cell line annotation of single-cell and compute super-cell purity in terms of this annotation

Compute GE for Super-cell data

Dimensionality reduction of Super-cell data

Clustering

Consistency between super-cell clustering and cell lines

Final

R Package Documentation

Browse R Packages

We want your feedback!

mariiabilous/SuperCellBM

In mariiabilous/SuperCellBM:

Load some default parameters

Flags

Load Tcells data from 10x Genomics database.

Compute Super-cells structure

Compute metacells in two settings:

Get actual graining levels obtained with Metacell

Compute super-cells (Exact, Approx, Subsampling and Random) at the addidional grainig levels obtained with Metacell.

Concatenate Metacells to the list of Super-cells

START FROM HERE!!!

Annotate super-cells to the cell lines based on the cell line annotation of single-cell and compute super-cell purity in terms of this annotation

Compute GE for Super-cell data

Dimensionality reduction of Super-cell data

Clustering

Consistency between super-cell clustering and cell lines

Final

R Package Documentation

Browse R Packages

We want your feedback!

Load `Tcells` data from 10x Genomics database.