A single-cell package for Functional Annotation + Gene Module Summarization.
Bowerbirds are famous for the elaborate and sometimes whimsical structures that males build to court females.
Picture credits: A satin bowerbird’s bower, decorated with blue objects. © Stefan Marks / Flickr
I'm not sure if this mating ritual is in anyway similar to my attempts to visualise pathway analyses and their association with celltype identities/states... who cares. It's just a name.
knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
# because currently private, requires specifying personal github access token devtools::install_github('clatworthylab/bowerbird', auth_token = "insert_your_personal_github_access_token") # or clone this repo and do devtools::install('path/of/folder/to/bowerbird')
Pathway analyis is tricky; often there's too many pathways/terms to use and it's not easy to distinguish how important every element is. bowerbird
attempts to summarise your pathways of interest into potentially meaningful functional annotation modules which you can then take forward to pathway analyses, or the reverse situation where you have a set of pathways that were determined a priori to be significant and you wanted to find out how to interpret it. To do this, we compute a shortest nearest neighbor graph using the genes in the gene sets/pathways and learn a representative summary term from the name of the gene sets that cluster together.
library(bowerbird) bwr <- bower(gmt_file) bwr <- snn_graph(bwr) bwr <- find_clusters(bwr) bwr <- summarize_clusters(bwr) bwr <- enrich_genesets(sce, bwr)
library(bowerbird) gmt_file <- system.file("extdata", "h.all.v7.4.symbols.gmt", package = "bowerbird") # read in a .gmt file bwr <- bower(gmt_file) # this performs a read_geneset step internally, which accepts .gmt, .gmx, .csv, .tsv, .txt, or R objects as list or data.frame format. bwr <- snn_graph(bwr) bwr <- find_clusters(bwr) bwr <- summarize_clusters(bwr)
Input formats accepts .gmt, .gmx, .csv, .tsv, .txt, or R objects as list or data.frame format.
csv_file <- system.file("extdata", "h.all.v7.4.symbols.csv", package = "bowerbird") # read in a .gmt file csv <- read.csv(csv_file) # see first 6 rows of the first 3 columns csv[1:6, 1:3]
And now to process it through.
bwr <- bower(csv) bwr <- snn_graph(bwr) bwr <- find_clusters(bwr) bwr <- summarize_clusters(bwr)
Doing some manual editing of a predefined geneset.
# download from msigdb website file <- system.file("extdata", "c5.go.bp.v7.4.symbols.gmt", package = "bowerbird") # manualy read in to do some fine adjustments/filtering geneset <- read_geneset(file) # reads in gene file manually # do a bit of manual filtering geneset <- geneset[grep('_B_CELL|_T_CELL|NATURAL_KILLER|ANTIBODY|ANTIGEN|LYMPHOCYTE|IMMUNE|INTERFERON|TOLL|INNATE_IMM|ADAPTIVE_IMM', names(geneset))] bwr <- bower(geneset) bwr <- snn_graph(bwr) bwr <- find_clusters(bwr) bwr <- summarize_clusters(bwr)
Or extracting from msigdbr
.
library(msigdbr) GO <- data.frame(msigdbr::msigdbr(category = "C5", subcategory = "GO:BP")) genesets <- GO[grep('_B_CELL|_T_CELL|NATURAL_KILLER|ANTIBODY|ANTIGEN|LYMPHOCYTE|IMMUNE|INTERFERON|TOLL|INNATE_IMM|ADAPTIVE_IMM', GO$gs_name), ] # convert to list gs_list <- lapply(unique(genesets$gs_name), function(x) genesets[genesets$gs_name %in% x, "gene_symbol"]) names(gs_list) <- unique(genesets$gs_name) bwr <- bower(gs_list) bwr <- snn_graph(bwr) bwr <- find_clusters(bwr) bwr <- summarize_clusters(bwr)
So what has actually happened? Well, basically bowerbird
has learnt which gene sets share more genes with each other, clustered them together and learnt a new summary term.
bwr
We can visualise it with a simple plot
plot_graph(bwr, colorby = 'cluster', node.size = 'geneset_size')
We can also compute the enrichment score and superimpose the values onto the visualization. To do this, we would need either a single-cell object (SingleCellExperiment
or Seurat
), or a list containing differential gene testing results.
# load a dummy dataset library(ktplots) data(kidneyimmune)
With popular single-cell methods.
## AUCell bwr <- enrich_genesets(kidneyimmune, bwr, groupby = 'celltype', mode = 'AUCell') ## Seurat::AddModuleScore bwr <- enrich_genesets(kidneyimmune, bwr, groupby = 'celltype', mode = 'Seurat') ## scanpy.tl.score_genes bwr <- enrich_genesets(kidneyimmune, bwr, groupby = 'celltype', mode = 'scanpy')
Visualise as a network plot.
celltypes <- levels(kidneyimmune) # celtypes stored as Idents in seurat object. plot_list <- lapply(celltypes, function(ds){ g <- plot_graph(bwr, colorby = ds) + viridis::scale_color_viridis() + theme(plot.title = element_text(size = 8, face = "bold")) }) cowplot::plot_grid(plotlist = plot_list, scale = 0.9)
fgsea
on deg tables in a list as an alternative to the single-cell methods.Here we use a differential expression result for marker genes as input.
degs <- Seurat::FindAllMarkers(kidneyimmune) degs <- split(degs, degs$cluster) # so in practice, there should be one DEG table per comparison in a list. bwr <- enrich_genesets(degs, bwr, gene_symbol = 'gene', logfoldchanges = 'avg_log2FC', pvals = 'p_val')
Coloured by adjusted p value.
plot_list <- lapply(celltypes, function(ds){ g <- plot_graph(bwr, colorby = ds, mode = 'gsea', gsea.slot = 'padj') + viridis::scale_color_viridis(na.value = '#e7e7e7') + theme(legend.position = 'none', plot.title = element_text(size = 8, face = "bold")) }) cowplot::plot_grid(plotlist = plot_list, scale = 0.9)
Coloured by normalized enrichment score.
plot_list <- lapply(celltypes, function(ds){ g <- plot_graph(bwr, colorby = ds, mode = 'gsea', gsea.slot = 'NES') + viridis::scale_color_viridis(na.value = '#e7e7e7') + theme(legend.position = 'none', plot.title = element_text(size = 8, face = "bold")) }) cowplot::plot_grid(plotlist = plot_list, scale = 0.9)
core
genes for enrichmentSame function as above, just specifying core = TRUE
. This will grab the intersecting (or outersecting rather) genes in a gene set cluster and perform the enrichment, like a summary gene set of the summarised terms.
bwr <- enrich_genesets(kidneyimmune, bwr, core = TRUE, groupby = 'celltype') # if mode is not specified, the default option is to use AUCell pheatmap::pheatmap(bwr@scores, cellheight = 20, cellwidth = 20)
Same for the gsea results. First looking at the NES.
bwr <- enrich_genesets(degs, bwr, core = TRUE, gene_symbol = 'gene', logfoldchanges = 'avg_log2FC', pvals = 'p_val') pheatmap::pheatmap(bwr@scores$NES, cellheight = 20, cellwidth = 20)
And now the FDR, remembering that <0.25
is usually the cut off.
pheatmap::pheatmap(bwr@scores$padj, cellheight = 20, cellwidth = 20, color = viridis::viridis(50, direction = -1))
To visualise as a graph, can consider doing:
bwr2 <- bower(bwr@coregenes) # and so on
Hopefully you find this interesting and potentially useful.
Please email me at kt16@sanger.ac.uk if you have any queries, feedback or ideas and I'll get back to you!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.