In clatworthylab/bowerbird: A single-cell package for Functional Annotation + Gene Module Summarization.

bowerbird

A single-cell package for Functional Annotation + Gene Module Summarization.

Bowerbirds are famous for the elaborate and sometimes whimsical structures that males build to court females.

Picture credits: A satin bowerbird’s bower, decorated with blue objects. © Stefan Marks / Flickr

I'm not sure if this mating ritual is in anyway similar to my attempts to visualise pathway analyses and their association with celltype identities/states... who cares. It's just a name.

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Installation

# because currently private, requires specifying personal github access token
devtools::install_github('clatworthylab/bowerbird', auth_token = "insert_your_personal_github_access_token")
# or clone this repo and do devtools::install('path/of/folder/to/bowerbird')

Quick Usage

Pathway analyis is tricky; often there's too many pathways/terms to use and it's not easy to distinguish how important every element is. bowerbird attempts to summarise your pathways of interest into potentially meaningful functional annotation modules which you can then take forward to pathway analyses, or the reverse situation where you have a set of pathways that were determined a priori to be significant and you wanted to find out how to interpret it. To do this, we compute a shortest nearest neighbor graph using the genes in the gene sets/pathways and learn a representative summary term from the name of the gene sets that cluster together.

library(bowerbird)
bwr <- bower(gmt_file)
bwr <- snn_graph(bwr)
bwr <- find_clusters(bwr)
bwr <- summarize_clusters(bwr)
bwr <- enrich_genesets(sce, bwr)

Full example

library(bowerbird)
gmt_file <- system.file("extdata", "h.all.v7.4.symbols.gmt", package = "bowerbird") # read in a .gmt file
bwr <- bower(gmt_file) # this performs a read_geneset step internally, which accepts .gmt, .gmx, .csv, .tsv, .txt, or R objects as list or data.frame format.
bwr <- snn_graph(bwr)
bwr <- find_clusters(bwr)
bwr <- summarize_clusters(bwr)

Input formats accepts .gmt, .gmx, .csv, .tsv, .txt, or R objects as list or data.frame format.

csv_file <- system.file("extdata", "h.all.v7.4.symbols.csv", package = "bowerbird") # read in a .gmt file
csv <- read.csv(csv_file)
# see first 6 rows of the first 3 columns
csv[1:6, 1:3]

And now to process it through.

bwr <- bower(csv)
bwr <- snn_graph(bwr)
bwr <- find_clusters(bwr)
bwr <- summarize_clusters(bwr)

Examples of different ways of loading in genesets

Doing some manual editing of a predefined geneset.

# download from msigdb website
file <- system.file("extdata", "c5.go.bp.v7.4.symbols.gmt", package = "bowerbird")
# manualy read in to do some fine adjustments/filtering
geneset <- read_geneset(file) # reads in gene file manually
# do a bit of manual filtering
geneset <- geneset[grep('_B_CELL|_T_CELL|NATURAL_KILLER|ANTIBODY|ANTIGEN|LYMPHOCYTE|IMMUNE|INTERFERON|TOLL|INNATE_IMM|ADAPTIVE_IMM', names(geneset))]

bwr <- bower(geneset)
bwr <- snn_graph(bwr)
bwr <- find_clusters(bwr)
bwr <- summarize_clusters(bwr)

Or extracting from msigdbr.

library(msigdbr)
GO <- data.frame(msigdbr::msigdbr(category = "C5", subcategory = "GO:BP"))
genesets <- GO[grep('_B_CELL|_T_CELL|NATURAL_KILLER|ANTIBODY|ANTIGEN|LYMPHOCYTE|IMMUNE|INTERFERON|TOLL|INNATE_IMM|ADAPTIVE_IMM', GO$gs_name), ]

# convert to list
gs_list <- lapply(unique(genesets$gs_name), function(x) genesets[genesets$gs_name %in% x, "gene_symbol"])
names(gs_list) <- unique(genesets$gs_name)

bwr <- bower(gs_list)
bwr <- snn_graph(bwr)
bwr <- find_clusters(bwr)
bwr <- summarize_clusters(bwr)

So what has actually happened? Well, basically bowerbird has learnt which gene sets share more genes with each other, clustered them together and learnt a new summary term.

bwr

We can visualise it with a simple plot

plot_graph(bwr, colorby = 'cluster', node.size = 'geneset_size')

Geneset average expression testing

We can also compute the enrichment score and superimpose the values onto the visualization. To do this, we would need either a single-cell object (SingleCellExperiment or Seurat), or a list containing differential gene testing results.

# load a dummy dataset
library(ktplots)
data(kidneyimmune)

With popular single-cell methods.

## AUCell
bwr <- enrich_genesets(kidneyimmune, bwr, groupby = 'celltype', mode = 'AUCell')

## Seurat::AddModuleScore
bwr <- enrich_genesets(kidneyimmune, bwr, groupby = 'celltype', mode = 'Seurat')

## scanpy.tl.score_genes
bwr <- enrich_genesets(kidneyimmune, bwr, groupby = 'celltype', mode = 'scanpy')

Visualise as a network plot.

celltypes <- levels(kidneyimmune) # celtypes stored as Idents in seurat object.
plot_list <- lapply(celltypes, function(ds){
  g <- plot_graph(bwr, colorby = ds) + viridis::scale_color_viridis() + theme(plot.title = element_text(size = 8, face = "bold"))
})
cowplot::plot_grid(plotlist = plot_list, scale = 0.9)

Running `fgsea` on deg tables in a list as an alternative to the single-cell methods.

Here we use a differential expression result for marker genes as input.

degs <- Seurat::FindAllMarkers(kidneyimmune)
degs <- split(degs, degs$cluster) # so in practice, there should be one DEG table per comparison in a list.
bwr <- enrich_genesets(degs, bwr, gene_symbol = 'gene', logfoldchanges = 'avg_log2FC',  pvals = 'p_val')

Coloured by adjusted p value.

plot_list <- lapply(celltypes, function(ds){
  g <- plot_graph(bwr, colorby = ds, mode = 'gsea', gsea.slot = 'padj') + viridis::scale_color_viridis(na.value = '#e7e7e7') + theme(legend.position = 'none', plot.title = element_text(size = 8, face = "bold"))
})
cowplot::plot_grid(plotlist = plot_list, scale = 0.9)

Coloured by normalized enrichment score.

plot_list <- lapply(celltypes, function(ds){
  g <- plot_graph(bwr, colorby = ds, mode = 'gsea', gsea.slot = 'NES') + viridis::scale_color_viridis(na.value = '#e7e7e7') + theme(legend.position = 'none', plot.title = element_text(size = 8, face = "bold"))
})
cowplot::plot_grid(plotlist = plot_list, scale = 0.9)

Using `core` genes for enrichment

Same function as above, just specifying core = TRUE. This will grab the intersecting (or outersecting rather) genes in a gene set cluster and perform the enrichment, like a summary gene set of the summarised terms.

bwr <- enrich_genesets(kidneyimmune, bwr, core = TRUE, groupby = 'celltype') # if mode is not specified, the default option is to use AUCell
pheatmap::pheatmap(bwr@scores, cellheight = 20, cellwidth = 20)

Same for the gsea results. First looking at the NES.

bwr <- enrich_genesets(degs, bwr, core = TRUE, gene_symbol = 'gene', logfoldchanges = 'avg_log2FC',  pvals = 'p_val')
pheatmap::pheatmap(bwr@scores$NES, cellheight = 20, cellwidth = 20)

And now the FDR, remembering that <0.25 is usually the cut off.

pheatmap::pheatmap(bwr@scores$padj, cellheight = 20, cellwidth = 20, color = viridis::viridis(50, direction = -1))

To visualise as a graph, can consider doing:

bwr2 <- bower(bwr@coregenes)
# and so on

Hopefully you find this interesting and potentially useful.

Please email me at kt16@sanger.ac.uk if you have any queries, feedback or ideas and I'll get back to you!

clatworthylab/bowerbird documentation built on Dec. 19, 2021, 5:15 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

clatworthylab/bowerbird
A single-cell package for Functional Annotation + Gene Module Summarization.

In clatworthylab/bowerbird: A single-cell package for Functional Annotation + Gene Module Summarization.

bowerbird

Installation

Quick Usage

Full example

Examples of different ways of loading in genesets

Geneset average expression testing

Running `fgsea` on deg tables in a list as an alternative to the single-cell methods.

Using `core` genes for enrichment

R Package Documentation

Browse R Packages

We want your feedback!

clatworthylab/bowerbird A single-cell package for Functional Annotation + Gene Module Summarization.

In clatworthylab/bowerbird: A single-cell package for Functional Annotation + Gene Module Summarization.

bowerbird

Installation

Quick Usage

Full example

Examples of different ways of loading in genesets

Geneset average expression testing

Running fgsea on deg tables in a list as an alternative to the single-cell methods.

Using core genes for enrichment

R Package Documentation

Browse R Packages

We want your feedback!

clatworthylab/bowerbird
A single-cell package for Functional Annotation + Gene Module Summarization.

Running `fgsea` on deg tables in a list as an alternative to the single-cell methods.

Using `core` genes for enrichment