The Gruffi R package helps you (1) to identify stressed cells in single-cell RNA-seq datasets using granular funcitonal filtering, or (2) you can use it to analyze any gene set's pathway activity (GO-term or custom defined), and define a cell population based on their combined activity.
Gruffi
integrates into any Seurat
analysis pipelione & it comes with a graphical user interface.
v1.5.*
is released, that fixes critical problems which arose with changes in dependencies and made the installation / pipeline break. Additionally it contains updates necessary to work with Seurat v5 objects.BioMart
or AnnotationDbi
, which is helpful, as BioMart connection is not always working. For the same reason, it now allows the usage of alternative mirrors for BioMart (thanks to @zuzkamat
).Added functionality
FindThresholdsAuto()
that is a shiny free version FindThresholdsShiny()
, which allows an automated workflow without the need for graphical output (thanks to @heyfl
). We still strongly recommend the shiny based workflow that enforces users to look at the results and intermediate steps instead of blidnly taking a TRUE / FALSE column.r
GrScoreUMAP(obj = combined.obj, colname = 'RNA_snn_res.6_cl.av_GO.0042063', miscname = 'thresh.stress.ident1')
GrScoreHistogram(obj = combined.obj, colname = 'RNA_snn_res.6_cl.av_GO.0042063', miscname = 'thresh.stress.ident1')
You can install dependencies from CRAN, Bioconductor and GitHub via devtools:
# Install CRAN & Bioconductor dependencies
install.packages('BiocManager')
BiocManager::install("DOSE")
BiocManager::install("org.Hs.eg.db")
BiocManager::install("sparseMatrixStats")
BiocManager::install("biomaRt")
BiocManager::install("raster")
BiocManager::install("rgl")
# Install custom dependencies
install.packages('devtools')
devtools::install_github(repo = "vertesy/Stringendo", upgrade = F)
devtools::install_github(repo = "vertesy/CodeAndRoll2", upgrade = F)
devtools::install_github(repo = "vertesy/ReadWriter", upgrade = F)
devtools::install_github(repo = "vertesy/MarkdownHelpers", upgrade = F)
devtools::install_github(repo = "vertesy/Markdownreports", upgrade = F)
devtools::install_github(repo = "vertesy/ggExpress", upgrade = F)
devtools::install_github(repo = "vertesy/Seurat.utils", upgrade = F)
# Install gruffi
devtools::install_github(repo = "jn-goe/gruffi", upgrade = F)
NOTE: If you type 'all' when R asks to update dependencies, you may get into installation errors / infinite loops. If updating fails, type 'no' when prompted.
Load your Seurat single-cell object of interest (in the following combined.obj
), perform basic analysis steps (Integration, PCA, Clustering) and calculate 2D and 3D umaps.
library(gruffi)
combined.obj <- Seurat.utils::SetupReductionsNtoKdimensions(obj = combined.obj, nPCs = 50, dimensions=3:2, reduction="umap")
# Note that this will recalculate 2D and 3D umaps, and back them up in combined.obj@misc$reductions.backup.
# See FAQ.md, if you don't want to overwrite the 2D UMAP.
If you want to store multiple UMAP's, you have to keep them in backup slot (in combined.obj@misc). Use combined.obj <- RecallReduction(obj = combined.obj, dim = 2, reduction="umap")
to access them.
Prepare GO-terms and gene-sets
go1 <- "GO:0006096" # Glycolysis
go2 <- "GO:0034976" # ER-stress
go3 <- "GO:0042063" # Gliogenesis, negative filtering
Gruffi works best if you partition cells into groups of 100-200 cells. Find the corresponding clustering resolution by:
combined.obj <- AutoFindGranuleResolution(obj = combined.obj)
The optimal resolution found is stored in:
(granule.res.4.gruffi <- GetGruffiClusteringName(combined.obj)) # Recalled from @misc$gruffi$'optimal.granule.res.
Some granules have too few cells, therfore their scoring is not robust statistically. Use ReassignSmallClusters
to assign these cells to the nearest large-enough granule:
combined.obj <- ReassignSmallClusters(combined.obj, ident = granule.res.4.gruffi) # Will be stored in meta data column as "seurat_clusters.reassigned".
````
Above, granules with <30 cells are cell-by-cell re-assigned to a neighboring granule (by default based on Euclidean distance between the mean of cell groups in 3dim UMAP space). The reassigned granules are suffixed as :
```r
(granule.res.4.gruffi <- GetGruffiClusteringName(combined.obj))
After finding the right granule resolution, first GO scores per cells, then average GO-scores per granule (normalized for respective cell numbers) are computed. Within the GO score computation an ensembl database will be loaded (that you created above as ensembl
).
# Glycolytic process GO:0006096
combined.obj <- AssignGranuleAverageScoresFromGOterm(obj = combined.obj, GO_term = go1, save.UMAP = TRUE, new_GO_term_computation = T, clustering = granule.res.4.gruffi, plot.each.gene = F)
# ER stress GO:0034976
combined.obj <- AssignGranuleAverageScoresFromGOterm(obj = combined.obj, GO_term = go2, save.UMAP = TRUE, new_GO_term_computation = T, clustering = granule.res.4.gruffi, plot.each.gene = F)
# Gliogenesis GO:0042063
combined.obj <- AssignGranuleAverageScoresFromGOterm(obj = combined.obj, GO_term = go3, save.UMAP = TRUE, new_GO_term_computation = T, clustering = granule.res.4.gruffi, plot.each.gene = F)
These functions store the resulting scores in combined.obj@meta.data
.
We will now call a Shiny Interface to auto-estimate and/or manually adjust the stress assignment threeholds via FindThresholdsShiny()
.
Example code for filtering cells high in glycolytic process and ER stress but low in gliogenesis:
# Create score names:
(i1 <- ParseGruffiGranuleScoreName(goID = go1))
(i2 <- ParseGruffiGranuleScoreName(goID = go2))
(i3 <- ParseGruffiGranuleScoreName(goID = go3))
# Call Shiny app
combined.obj <- FindThresholdsShiny(obj = combined.obj,
stress.ident1 = i1,
stress.ident2 = i2,
notstress.ident3 = i3)
"Dont forget to click the button in the app: Save New Thresholds"
The interface allows automatic estimation and manual adjustment of thresholds for stress scores:
After pushing the Save new thresholds button in the Shiny graphical user interface, thresholds are saved in combined.obj@misc$gruffi
and the stress assignment is stored as a new @meta.data
column, is.Stressed
. Check results with:
StressUMAP(combined.obj)
StressBarplotPerCluster(combined.obj, group.by = GetClusteringRuns()[1])
GrScoreHistogram(combined.obj, colname = i1, miscname = "thresh.stress.ident1")
# and more
cellIDs.keep <- which_names(!combined.obj$'is.Stressed')
subset.obj <- subset(x = combined.obj, cells = cellIDs.keep)
StressUMAP(subset.obj, title = "No Stressed Cells Remain After Gruffi", suffix = "after.cleanup")
For Errors, Problems & Questions: please see the FAQ or raise an issue in the github repository. We try to answer.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.