In pengminshi/MRtree: Multi-resolution Reconciled Tree

```{css, echo=FALSE} pre code { white-space: pre !important; overflow-x: scroll !important; word-break: keep-all !important; word-wrap: initial !important; }

```r
knitr::opts_chunk$set(collapse = TRUE, comment = "#>",
                      echo=TRUE, message=FALSE, warnings=FALSE
                      )

Introduction

A wealth of clustering algorithms are available for Single-cell RNA sequencing (scRNA-seq), but it remains challenging to compare and characterize the features across different scales of resolution. To resolve this challenge Multi-resolution Reconciled Tree (MRtree), builds a hierarchical tree structure based on multi-resolution partitions that are highly flexible and can work with most scRNA-seq clustering algorithms. This tutorial will show the usage of the MRtree R package and how to couple it with different clustering methods.

Usage

The main function to perform MRtree analysis is mrtree(), where the data input is one of the following:

An n-by-m matrix of labels, where n rows correspond to the n samples (cells), m columns correspond to m clusterings, one per resolution.
A data frame, with the columns giving the clustering results with different resolutions specified by prefix, prefix， and suffix, suffix,. For example, by specifying prefix=’K_‘, columns in the data frame starting with K_ will be considered.
Seurat object with multiresolution clustering results. The clusterings are by default given stored in the object with column prefix RNA_snn_res. as the output of Seurat Louvain clustering. Need to modify the prefix if using a different assay and different neighbor graph or method.
SingleCellExperiment (SC3) object with multiresolution clustering results. By default, the clusterings are in the columns with the prefix sc3_ and suffix _clusters as output from SC3 clustering.

sc_clustering_methods.R file provide a few clustering wrappers (note that some of the dependent packages are not imported in package installation, to avoid redundant dependencies, so make sure the respective clustering package is installed before the analysis, see instructions below), including:

sc_clustering.seurat, graph clustering provided by Seurat package (@RN226). Make sure to install Seurat and its dependent R packages (R >= 4.0.0).
sc_clustering.sc3, Consensus clustering provided by SC3 package (@RN33). Make sure to install SC3 and SingleCellExperiment R packages from Bioconductor (R >= 3.3)
sc_clustering.soup, Semi-soft clustering provided in SOUP package (@RN195). Make sure to install the SOUP package following the instructions on \url{https://github.com/lingxuez/SOUPR} (R >= 3.4.2).
sc_clustering.umap_kmeans, UMAP dimension reduction followed by K-means clustering in the low dimensional space. Make sure to install uwot package for performing UMAP and ADPclust for selecting the number of clusters for comparison (R ≥ 3.0.0).
sc_clustering.simlr, Single-cell Interpretation via Multi-kernel LeaRning (SIMLR) (@wang2018simlr). Make sure to install SIMLR package from Bioconductor (R >= 4.0.0)

To use other clustering methods, we first generate the clusterings at multiple resolutions. Save the results into a label matrix, where rows are samples and columns are clusterings.

The output of the main function mrtree contains:

labelmat.mrtree The reconciled tree represented using the label matrix, where the first and last column corresponds to the top and bottom layer in the tree. The repeated columns are removed to leave the unique layers of the tree.
labelmat.recon The reconciled tree is stored in a label matrix, where the columns match with the columns from the raw input label matrix. This enables matching columns between initial raw clustering and final clusters in MRtree-constructed tree to calculate the stability metric.
labelmat.flat The initial flat clustering label matrix as input.
labelmat.ccs The results from with-resolution concensus clustering if performed ('concensus=TRUE').
resolutions Resolution parameters if provided by the input object.
paths The remaining tree paths stored as a label matrix.
params Store the parameters to reproduce the results.

Example with simulated data

In this tutorial, we illustrate the usage of MRtree functions using a simulated dataset. The data is simulated from SymSim by supplying a tree structure as shown below, containing UMI counts of 500 cells on 500 genes. If you want generate the simulated data using the simulation functions, make sure to install SymSim and the dependent packages listed on \url{https://github.com/YosefLab/SymSim}. Otherwise you can use the saved data example data_example. Tree structure used for simulating single cell data using SymSim packages

library(mrtree)

# # The data simulation can be performed using SymSim package with the following wrapper function,
# # The following simulation code take some time to run (skip to load data with data("data_example"))
# tree = tree1(plot=T)$phylo.tree
# symsim.param.zeisel = list(gene_effects_sd=2, sigma=0.2, scale_s=0.4, alpha_mean=0.5, alpha_sd=0.025,
#                            depth_mean=2e5,depth_sd=1.5e4, nPCR1=14, prop_hge=0.01, mean_hge=3)
# ngenes = 500
# ncells = 500
# truek = 8; min_popsize = floor(ncells/truek)
# i_minpop =  which(tree$tip.label=='Microglia')
# 
# 
# seed =  42
# simu.out = generateDataSymSim(ncells=ncells, ngenes=ngenes, tree=tree,
#                               params=symsim.param.zeisel, plot.tsne=T,
#                               i_minpop=i_minpop, min_popsize=min_popsize,
#                               nevf=150, n_de_evf=40, sigma=0.9, seed=seed)
# 
# simu.out$tsne_UMI_counts
# dat = simu.out
# # data_example = simu.out; usethis::use_data(data_example)

data("data_example", package = 'mrtree') # load data
dat = data_example
# tsne plot of 500 cells
dat$tsne_UMI_counts

MRtree with Seurat clustering

As the first step, we conduct multi-resolution flat clustering using Seurat with a range of resolution parameters. We recommend choosing the maximum resolution (gamma.max) between $2$ and $3$. The initial multi-resolution clustering results can be visualized using a cluster tree, where each layer corresponds to one clustering, the tree nodes represent clusters, and edges between clusters in adjacent layers indicate shared data points between two clusters.

set.seed(42)

metadata = dat$metadata
rownames(metadata) = dat$metadata$cellid
ref.labels = dat$metadata$type

# specify the resolution parameters
# resolutions = seq(0.1, sqrt(3), 0.05)^2

# alternatively and preferrably, we provide a sampling tool to 
# sample resolution parameters to uniformly cover different scales
A = seurat_get_nn_graph(counts=dat$counts, metadata=metadata, npc=10)
resolutions = modularity_event_sampling(A=A, n.res=30, 
                                        gamma.min=0.01, gamma.max=2.5) # sample based on the similarity matrix

# clustering using Suerat 
seurat.out = sc_clustering.seurat(counts=dat$counts, resolutions=resolutions, 
                                  metadata=metadata, npcs=10,
                                  return.seurat.object=TRUE, vars.to.regress=NULL,
                                  find.variable.features=FALSE,verbose=FALSE)

# initial cluster tree from Seurat flat clustering
plot_clustree(labelmat=seurat.out$seurat.clusters, prefix ='RNA_snn_res.', 
              ref.labels = ref.labels, plot.ref = FALSE)

Then we apply MRtree to obtain the hierarchical cluster tree. The resulted tree structure can be visualized using a dendrogram, with a pie chart on each tree node detailing the cluster composition given the known true labels.

out = mrtree(seurat.out$obj, n.cores = 2, consensus=FALSE, augment.path=FALSE)
# if there are few partitions per k, within resolution consensus step can speed up the algorithm
# weight per sample is encoraged if the classes are imbalanced

plot_tree(labelmat=out$labelmat.mrtree, ref.labels=ref.labels, plot.piechart = TRUE,
          node.size = 0.4, tip.label.dist = 10, bottom.margin=30 )

We evaluate the per-resolution clustering performance with a novel metric, Adjusted Multiresolution Rand Index (AMRI). The metric is adapted from Adjusted Rand Index (ARI) to address the problem of ARI biasing for higher resolution.

ks.flat = apply(out$labelmat.flat, 2, FUN=function(x) length(unique(x)))
ks.mrtree = apply(out$labelmat.mrtree, 2, FUN=function(x) length(unique(x)))
amri.flat = sapply(1:ncol(out$labelmat.flat), function(i) AMRI(out$labelmat.flat[,i], ref.labels)$amri)
amri.flat = aggregate(amri.flat, by=list(k=ks.flat), FUN=mean)
amri.recon = sapply(1:ncol(out$labelmat.mrtree), function(i) AMRI(out$labelmat.mrtree[,i], ref.labels)$amri)

df = rbind(data.frame(k=amri.flat$k, amri=amri.flat$x, method='Seurat flat'), 
           data.frame(k=ks.mrtree, amri=amri.recon, method='MRtree'))
ggplot2::ggplot(data=df, aes(x=k, y=amri, color=method)) + geom_line() + theme_bw()

We calculate the similarity between the initial flat clustering and MRtree clusters across resolution on the x-axis. Lower similarity indicates the selected clustering algorithm is not able to generate stable clusters at the given resolution. In this example, the changing point is observed at $k=7$, while the stability drops steeply when $k>8$.

stab.out = stability_plot(out)
stab.out$plot

MRtree with SC3 clustering

Next, we apply MRtree together with SC3, which is itself a consensus method, despite that a user-provided resolution parameter is still required.

ks = 2:10
metadata = dat$metadata
rownames(metadata) = dat$metadata$cellid
ref.labels = dat$metadata$type

set.seed(1)
clust.out =  sc_clustering.sc3(exprs=dat$counts, Ks=ks, type = 'counts', colData=metadata,
                               biology = F, gene_filter = F,  n_cores=2,  pct_dropout_min=-1)

Plot the initial clustering reults from SC3 via clustree:

plot_clustree(labelmat=clust.out$sce@colData, prefix='sc3_', suffix = "_clusters",
              ref.labels = ref.labels, plot.ref = F)

Run MRtree and plot the resulted hierarchical cluster tree:

out = mrtree(clust.out$sce, n.cores = 2, prefix='sc3_', suffix = "_clusters", 
             consensus=F, sample.weighted=F)

plot_tree(labelmat=out$labelmat.mrtree, ref.labels=ref.labels, plot.piechart = T,
          node.size = 0.4, tip.label.dist = 10, bottom.margin=30 )

We evaluate the per-resolution clustering performance via AMRI:

ks.flat = apply(out$labelmat.flat, 2, FUN=function(x) length(unique(x)))
ks.mrtree = apply(out$labelmat.mrtree, 2, FUN=function(x) length(unique(x)))
amri.flat = sapply(1:ncol(out$labelmat.flat), function(i) AMRI(out$labelmat.flat[,i], ref.labels)$amri)
amri.flat = aggregate(amri.flat, by=list(k=ks.flat), FUN=mean)
amri.recon = sapply(1:ncol(out$labelmat.mrtree), function(i) AMRI(out$labelmat.mrtree[,i], ref.labels)$amri)

df = rbind(data.frame(k=amri.flat$k, amri=amri.flat$x, method='SC3 flat'), 
           data.frame(k=ks.mrtree, amri=amri.recon, method='MRtree'))
ggplot2::ggplot(data=df, aes(x=k, y=amri, color=method)) + geom_line() + theme_bw()

We calculate the similarity between the initial flat clustering and MRtree clusters across scales.

stab.out = stability_plot(out)
stab.out$plot

Reference

pengminshi/MRtree documentation built on March 6, 2023, 4:20 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

pengminshi/MRtree
Multi-resolution Reconciled Tree

In pengminshi/MRtree: Multi-resolution Reconciled Tree

Introduction

Usage

Example with simulated data

MRtree with Seurat clustering

MRtree with SC3 clustering

Reference

R Package Documentation

Browse R Packages

We want your feedback!

pengminshi/MRtree Multi-resolution Reconciled Tree

In pengminshi/MRtree: Multi-resolution Reconciled Tree

Introduction

Usage

Example with simulated data

MRtree with Seurat clustering

MRtree with SC3 clustering

Reference

R Package Documentation

Browse R Packages

We want your feedback!

pengminshi/MRtree
Multi-resolution Reconciled Tree