In markziemann/Mitch: Multi-Contrast Gene Set Enrichment Analysis

title: "mitch Workflow" author: "Antony Kaspi & Mark Ziemann" date: "r Sys.Date()" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{mitch Workflow} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc}

Background

mitch is an R package for multi-contrast enrichment analysis. At it’s heart, it uses a rank-MANOVA based statistical approach to detect sets of genes that exhibit enrichment in the multidimensional space as compared to the background. The rank-MANOVA concept dates to work by Cox and Mann (https://doi.org/10.1186/1471-2105-13-S16-S12). mitch is useful for pathway analysis of profiling studies with one, two or more contrasts, or in studies with multiple omics profiling, for example proteomic, transcriptomic, epigenomic analysis of the same samples. mitch is perfectly suited for pathway level differential analysis of scRNA-seq data.

The main strengths of mitch are that it can import datasets easily from many upstream tools and has advanced plotting features to visualise these enrichments. mitch consists of five functions. A typical mitch workflow would consist of:

1: Import gene sets with gmt_import()
2: Import profiling data with mitch_import()
3: Calculate enrichments with mitch_calc()
4: And generate plots and reports with mitch_plots(), mitch_report() and networkplot().

if(!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("mitch")

library("mitch")

Importing gene sets

mitch has a function to import GMT files to R lists (adapted from Yu et al, 2012 in the clusterProfiler package). For example we can grab some gene sets from Reactome.org:

download.file("https://reactome.org/download/current/ReactomePathways.gmt.zip",
    destfile="ReactomePathways.gmt.zip")
unzip("ReactomePathways.gmt.zip")
genesets <- gmt_import("ReactomePathways.gmt")

In this cut down example we will be using a sample of 200 Reactome gene sets:

data(genesetsExample)
head(genesetsExample,3)

Importing profiling data

mitch accepts pre-ranked data supplied by the user, but also has a function called mitch_import for importing tables generated by limma, edgeR, DESeq2, ABSSeq, Sleuth, Seurat, Muscat and several other upstream tools. By default, only the genes that are detected in all contrasts are included, but this behaviour can be modified for sparse data setting joinType=full. The below example imports two edgeR tables called "rna" and "k9a" Where gene identifiers are present as row names. Note that if there is more than one profile being imported, they need to be part of a list.

data(rna,k9a)
x <- list("rna"=rna,"k9a"=k9a)
y <- mitch_import(x,"edgeR")
head(y)

mitch can do unidimensional analysis if you provide it a single profile as a dataframe (not in a list).

y <- mitch_import(rna,DEtype="edger")
head(y)

If the gene identifiers are not given in the rownames, then the column can be specified with the geneIDcol parameter like this:

# first rearrange cols
rna_mod <- rna
rna_mod$MyGeneIDs <- rownames(rna_mod)
rownames(rna_mod) <- seq(nrow(rna_mod))
head(rna_mod)
# now import with geneIDcol
y <- mitch_import(rna_mod,DEtype="edgeR",geneIDcol="MyGeneIDs")
head(y)

By default, differential gene activity is scored using a supplied test statistic or directional p-value (D):

D = sgn(logFC) * -log10(p-value)

If this is not desired, then users can perform their own custom scoring procedure and import with DEtype="prescored".

There are many cases where the gene IDs don't match the gene sets. To overcome this, mitch_import also accepts a two-column table (gt here) that relates gene identifiers in the profiling data to those in the gene sets. In this example we can create some fake gene accession numbers to demonstrate this feature.

library("stringi")
# obtain vector of gene names
genenames <- rownames(rna)
# create fake accession numbers
accessions <- paste("Gene0",stri_rand_strings(nrow(rna)*2, 6, pattern = "[0-9]"),sep="")
accessions <- head(unique(accessions),nrow(rna))
# create a gene table file that relates gene names to accession numbers
gt <- data.frame(genenames,accessions)

# now swap gene names for accessions
rna2 <- merge(rna,gt,by.x=0,by.y="genenames")
rownames(rna2) <- rna2$accessions
rna2 <- rna2[,2:5]

k9a2 <- merge(k9a,gt,by.x=0,by.y="genenames")
rownames(k9a2) <- k9a2$accessions
k9a2 <- k9a2[,2:5]

# now have a peek at the input data before importing
head(rna2,3)
head(k9a2,3)
head(gt,3)
x <- list("rna2"=rna2,"k9a2"=k9a2)
y <- mitch_import(x,DEtype="edgeR",geneTable=gt)
head(y,3)

?mitch_import provides more instructions on using this feature.

Calculating enrichment

The mitch_calc function performs multivariate enrichment analysis of the supplied gene sets in the scored profiling data. At its simpest form mitch_calc function accepts the scored data as the first argument and the genesets as the second argument. Users can prioritise enrichments based on small adjusted p-values, by the observed effect size ("s.dist", the magnitude of "s", the enrichment score) or the standard deviation of the s scores. For most analyses, you will see more specific results when prioritising by effect, but it is worth looking at both.

Note that the number of parallel threads ("cores") is set here to cores=2 but the default is to use just one. If you have many available threads, then a good number is 8-12. Note that the more CPU threads used, the more RAM is required, so monitor the available memory on your system.

# prioritisation by significance
res <- mitch_calc(y,genesetsExample,priority="significance",cores=2)

# peek at the results
head(res$enrichment_result)

# prioritisation by effect size
res <- mitch_calc(y,genesetsExample,priority="effect",cores=2)

head(res$enrichment_result)

By default, gene sets with fewer than 10 members present in the profiling data are discarded. This threshold can be modified using the minsetsize option. There is no upper limit of gene set size.

res <- mitch_calc(y,genesetsExample,priority="significance",minsetsize=5,cores=2)

By default, in downstream visualisation steps, charts are made from the top 50 gene sets, but this can be modified using the resrows option.

res<-mitch_calc(y,genesetsExample,priority="significance",resrows=3,cores=2)

Generate a HTML report

The HTML reports contain several plots as raster images and interactive charts which are useful as a first-pass visualisation. These can be generated like this:

mitch_report(res,"myreport.html",overwrite=TRUE)

Generate high resolution plots

In case you want the charts in PDF format, for example for publications, these can be generated as such:

mitch_plots(res,outfile="mycharts.pdf")

Network plot

For one-dimensional analysis, you can generate a network plot to visualise the similarity of the driver gene members for the top (20) up- and downregulated sets meeting a significance cut-off (FDR<0.05). These cut-offs can be altrered.

By default, this includes genes scored in the top and bottom tertiles, otherwise they are discarded. The intensity of the colour is proportional to the s.dist value (enrichment score). Circle size is proportional to the number of genes in the set. Line thickness is proportional to the Jaccard similarity value. This function works best after prioritisation with "effect" when running mitch_calc(). Note that the circle (node) size and the line width shown in the legend is approximate, although the values shown are exactly the smallest and largest respectively. Note that this chart works best when the width is double the height, otherwise many of the long gene set names could be cut off. There is an element of stochasticity with regard to the network projection, so it could be a good idea to repeat it a few times until you get a nice layout.

Lists of enriched gene set members are shown.

networkplot(res,FDR=1,n_sets=10)

network_genes(res,FDR=1,n_sets=10)

Session Info

sessionInfo()

markziemann/Mitch documentation built on June 14, 2025, 10:17 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

markziemann/Mitch
Multi-Contrast Gene Set Enrichment Analysis

In markziemann/Mitch: Multi-Contrast Gene Set Enrichment Analysis

Background

Importing gene sets

Importing profiling data

Calculating enrichment

Generate a HTML report

Generate high resolution plots

Network plot

Session Info

R Package Documentation

Browse R Packages

We want your feedback!

markziemann/Mitch Multi-Contrast Gene Set Enrichment Analysis

In markziemann/Mitch: Multi-Contrast Gene Set Enrichment Analysis

Background

Importing gene sets

Importing profiling data

Calculating enrichment

Generate a HTML report

Generate high resolution plots

Network plot

Session Info

R Package Documentation

Browse R Packages

We want your feedback!

markziemann/Mitch
Multi-Contrast Gene Set Enrichment Analysis