library(BiocStyle)
knitr::include_graphics(system.file('docs', 'enrichMiR_sticker.png', package='enrichMiR'))

Installing

You can install the package using:

BiocManager::install("ETHZ-INS/enrichMiR")



Preparing the data

EnrichMiR requires two main inputs: a target annotation object, and the signal in which you want to look for target enrichment, such as a geneset (versus a background set) or the results of a differential-expression analysis (DEA). Genesets would simply be vectors of gene identifiers, while a DEA would be a table that looks like this:

library("enrichMiR")
data("exampleDEA",package="enrichMiR")
head(exampleDEA)

The row names should be gene identifiers (or also transcript identifiers, depending on the target annotation), and it should minimally have the 'logFC', 'PValue' and 'FDR' columns (similar column names from established RNAseq DEA packages can be automatically recognized).

The other important input is a target annotation, which looks like this:

# we load the TargetScan conserved target annotation for human:
data("consTShs", package="enrichMiR")
consTShs[,1:4]

A target annotation minimally has the columns "set" (indicating the sets to be tested, in this case the miRNA families, denoted by their seed region) and
"feature" (the features for which enrichment is tested, such as genes), together denoting a miRNA-target pair, and optionally the columns "sites" (indicating how many binding sites the feature has for the miRNA) and "score" indicating for instance a repression score. (The objects can also include metadata regarding synonymous feature names and miRNA family members).

Running an enrichment analysis

An enrichment analysis can be launched as follows:

devtools::load_all("../")
er <- testEnrichment(exampleDEA, consTShs)

Unless the tests are specified, this will run the default tests, namely the siteoverlap test respectively on significantly upregulated and downregulated genes, as well as the areamir test on the continous differential expression signal. Printing the object will give an overview of the top results of each test:

er

We see that both the siteoverlap test on downregulated genes and the areamir test give the same top enrichments, while the siteoverlap test on upregulated genes gives no enrichment. An aggregated table for all tests can also be obtained using getResults(er), and more detailed results of a test can be viewed with:

er$siteoverlap.down

The results can alternatively be visualized with the enrichPlot function:

enrichPlot(er$siteoverlap.down)

CD plots

A good way to visualize an enrichment is to plot the cumulative foldchange distributions of different target groups. This can be done with the CDplotWrapper function:

CDplotWrapper(dea=exampleDEA, sets=consTShs, setName="UGGUCCC")

The genes in each sets are ranked according to their foldchange (the x-axis), while the y-axis shows the proportion of genes in the set that have a foldchange below or equal to that given on x. In this way, small shifts in foldchanges can easily be visualized.

In the presence of a real miRNA effect, we expect to see a gradual dose-response pattern, as shown above, meaning that targets with stronger binding sites also have stronger log-foldchanges (i.e. lower logFC, in the case of an increased miRNA activity, as in this example).

There are different ways of splitting the genes into groups, which can be specified using the by argument:

CDplotWrapper(dea=exampleDEA, sets=consTShs, setName="UGGUCCC", by="sites")

Because UTR length is correlated with more binding sites for any miRNA, the number of sites tends to give spurious signals. We therefore recommend splitting by best binding type if this information is available in the target annotation you are using, or otherwise splitting by predicted repression score (if available).

Overview of the different enrichment tests

enrichMiR implements different statistical tests for target enrichment. You can view what tests are possible for a given input using:

availableTests(exampleDEA, consTShs)

Some tests depend on continous inputs (e.g. fold-changes), and hence require the results of a differential expression analysis (DEA) to be provided), while others are binary, i.e. based on the membership in a gene set. When the input is a DEA, each binary tests will be performed on both significantly upregulated and downregulated genes.

Below is a description of the various enrichment tests. We recommend using the siteoverlap (most sensitive) or areamir tests. For a more detailed benchmark, please consult the publication.

enrichMiR:::.testDescription()



Session info

sessionInfo()


ETHZ-INS/enrichMiR documentation built on July 20, 2024, 12:03 a.m.