Intro

r Githubpkg("pinin4fjords/shinyngs") is a package designed to facilitate downstream analysis of RNA-seq and similar matrix data with various exploratory plots. It's a work in progress, with new features added on a regular basis. Individual components (heatmaps, pca etc) can function independently and will be useful outside of the RNA-seq context.

Example: the gene page

Motivation

It's not always trivial to quickly assess the results of next-generation sequencing experiment. r Githubpkg("pinin4fjords/shinyngs") is designed to help fix that by providing a way of instantly producing a visual tool for data mining at the end of an analysis pipeline.

Features

Installation

Prerequisites

shinyngs relies heavily on SumamrizedExperiment. Formerly found in the GenomicRanges package, it now has its own package on Bioconductor: http://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html. This requires a recent version of R.

Graphical enhancements are provided by shinyBS and shinyjs

Install with devtools

library(devtools)
install_github('pinin4fjords/shinyngs')

Concepts and data structures

The data structures used by Shinyngs build on SummarizedExperiment. One SummarizedExperiment can have multiple 'assays', essentially matrices with samples in columns and 'features' (transcripts or genes) in rows, representing different results for the same features and samples. This is handy to compare results before and after processing, for example. ExploratorySummarizedExperiment extends SummarizedExperiment to include slots relating to annotation, and associated results of 'tests', providing p values and q values.

ExploratorySummarizedExperimentList is a container for one or more ExploratorySummarizedExperiment objects, and is intented to describe an overall study, i.e. one or more experiments the same set of samples, but different sets of features in each experiment. The ExploratorySummarizedExperimentListList therefore is used to supply study-wide things such as contrasts, gene sets, url roots for creating links etc.

Simple example working from a SummarizedExperiment

To see how to quickly build an RNA-seq app from a simple SummarizedExperiment, we can use the example data in the airway package. We just convert the RangedSummarizedExperiment to an ExploratorySummarizedExperiment, and add it to a list of such objects, which represent a study.

library(shinyngs)

data(airway, package = 'airway')
ese <- as(airway, 'ExploratorySummarizedExperiment')
eselist <- ExploratorySummarizedExperimentList(ese)

Then we build and run the app. For example, a basic app just for heatmaps:

app <- prepareApp('heatmap', eselist)
shiny::shinyApp(ui = app$ui, server = app$server)

Note the use of prepareApp to generate the proper ui and server, which are then passed to Shiny.

We can build a more comprehensive app with multiple panels aimed at RNA-seq:

app <- prepareApp('rnaseq', eselist)
shiny::shinyApp(ui = app$ui, server = app$server)

Airway provides some info about the dataset, which we can add in to the object before we build the app:

data(airway, package = 'airway')
expinfo <- metadata(airway)[[1]]

eselist <- ExploratorySummarizedExperimentList(
  ese,
  title = expinfo@title,
  author = expinfo@name,
  description = abstract(expinfo)
)
app <- prepareApp('rnaseq', eselist)
shiny::shinyApp(ui = app$ui, server = app$server)

All this app knows about is gene IDs, however, which aren't all that informative for gene expression plots etc. We can add row metadata to fix that:

# Use Biomart to retrieve some annotation, and add it to the object

library(biomaRt)
attributes <- c(
  'ensembl_gene_id', # The sort of ID your results are keyed by
  'entrezgene', # Will be used mostly for gene set based stuff
  'external_gene_name' # Used to annotate gene names on the plot
)

mart <- useMart(biomart = 'ENSEMBL_MART_ENSEMBL', dataset = 'hsapiens_gene_ensembl', host='www.ensembl.org')
annotation <- getBM(attributes = attributes, mart = mart)
annotation <- annotation[order(annotation$entrezgene),]

mcols(ese) <- annotation[match(rownames(ese), annotation$ensembl_gene_id),]

# Tell shinyngs what the ids are, and what field to use as a label

ese@idfield <- 'ensembl_gene_id'
ese@labelfield <- 'external_gene_name'

# Re-build the app

eselist <- ExploratorySummarizedExperimentList(
  ese,
  title = expinfo@title,
  author = expinfo@name,
  description = abstract(expinfo)
)
app <- prepareApp('rnaseq', eselist)
shiny::shinyApp(ui = app$ui, server = app$server)

More complex use case: the zhangneurons Example dataset

airway is fine, but it contains no information on differential expression. shinyngs provides extra slots for differential analyses, among other things.

An example ExploratorySummarizedExperimentList based on the Zhang et al study of neurons and glia (http://www.jneurosci.org/content/34/36/11929.long) is included in the zhangneurons package, and this can be used to demonstrate available features. The dataset includes transcript- and gene- level quantification estimates (as ExporatorySummarizedExperiments within an ExploratorySummarizedExperimentList, and three levels of processing (raw, filtered, normalised) in the assays slots of each.

Note: this data was generated using Salmon (https://combine-lab.github.io/salmon/) for quantification, and results may therefore be slightly different to the authors' online tool (which did not use Salmon).

Install the data package:

library(devtools)
install_github('pinin4fjords/zhangneurons')

... and load the data.

library(shinyngs)
data("zhangneurons")

The data can then be used to build an application:

app <- prepareApp("rnaseq", zhangneurons)
shiny::shinyApp(app$ui, app$server)

This example generates the full application designed for RNA-seq analysis. Remember that individual components can be created too:

app <- prepareApp("heatmap", zhangneurons)
shiny::shinyApp(app$ui, app$server)

Building an application from a YAML file

An alternative and simple way to create an application is to describe your experiment using a YAML file, and pass the YAML file to Shinyngs. This has advantages where a pipeline produces many outputs outside of R which then have to be read and compiled.

The eselistFromYAML() function is provided to help construct an ExploratorySummarizedExperiment object. You might make a file like:

title: My RNA seq experiment
author: Joe Blogs
report: report.md
group_vars:
  - Group
  - Replicate
default_groupvar: Group
experiments:
  Gene:
    coldata:
      file: my.experiment.csv
      id: External
    annotation:
      file: my.annotation.csv
      id: gene_id
      entrez: ~
      label: gene_id
    expression_matrices:
      Raw:
        file: raw_counts.csv
        measure: counts
      Filtered:
        file: filtered_counts.csv
        measure: Counts per million
      Normalised:
        file: normalised_counts.csv
        measure: Counts per million
    read_reports:
      read_attrition: read_attrition.csv
contrasts:
  comparisons:
    0:
    - Group
      control
      TreatmentA
    1:
    - Group
      control
      TreatmentB
stats:
  Gene:
    Normalised:
      pvals: pvals.csv
      qvals: qvals.csv

You can then generate the object with a command like eselist <- eselistFromYAML('my.yaml'). This is how the zhangneurons dataset was generated- see vignette(zhangneurons) for details, and for the component input files themselves.

Building an application from scratch

To demonstrate this, let's break down zhangneurons into simple datatypes and put it back together again.

Assays

# Assays is a list of matrices
library(zhangneurons)
data(zhangneurons, envir = environment())
myassays <- as.list(SummarizedExperiment::assays(zhangneurons[[1]]))
head(myassays[[1]])

colData

colData is your sample information defining groups etc

mycoldata <- data.frame(SummarizedExperiment::colData(zhangneurons[[1]]))
head(mycoldata)

Annotation

Annotation is important to `shinyngs'. You need a data frame with rows corresonding to those in the assays

myannotation <- SummarizedExperiment::mcols(zhangneurons[[1]])
head(myannotation)

Making an ExploratorySummarizedExperiment

Now we can put these things together to create an 'ExploratorySummarizedExperiment:

myese <- ExploratorySummarizedExperiment(
    assays = SimpleList(
      myassays
    ),
    colData = DataFrame(mycoldata),
    annotation <- myannotation,
    idfield = 'gene_id',
    labelfield = "gene_name"
  )
print(myese)

Note the extra fields that mostly tell shinyngs about annotation to help with labelling etc.

Making an ExploratorySummarizedExperimentList

ExploratorySummarizedExperimentLists are basically a list of ExploratorySummarizedExperiments, with additional metadata slots.

myesel <- ExploratorySummarizedExperimentList(
  eses = list(expression = myese),
  title = "My title",
  author = "My Authors",
  description = 'Look what I gone done'
)

You can use this object to make an app straight away:

app <- prepareApp("rnaseq", myesel)
shiny::shinyApp(app$ui, app$server)

... but it's of limited usefulness because the sample groupings are not highlighted. We need to specify group_vars for that to happen, picking column names from the colData:

myesel@group_vars <- c('Group', 'Tissue')

.. then if we re-make the app you should see group highlighting.

app <- prepareApp("rnaseq", myesel)
shiny::shinyApp(app$ui, app$server)

... for example, in the PCA plot

Example: the gene page

Specifying contrasts for differential outputs

But where are the extra plots for looking at differential expression? For those, we need to supply contrasts. Contrasts are supplied as a list of character vectors describing the variable in colData upon the contrast is based, and the two values of that variable to use in the comparison. We'll just copy the one over from the original zhangneurons:

zhangneurons@contrasts
myesel@contrasts <- zhangneurons@contrasts

Run the app again and you should see tables of differential expression, and scatter plots between pairs of conditions.

app <- prepareApp("rnaseq", myesel)
shiny::shinyApp(app$ui, app$server)

But without information on the significance of the fold changes, we can't make things like volcano plots. For those we need to populate the contrast_stats slot. contrast_stats is a list of lists of matrices in the ExploratorySummarizedExperiment objects, with list names matching one or more of the names in assays, second-level names being 'pvals' and 'qvals' and the columns of each matrix corresponding the the contrasts slot of the containing ExploratorySummarizedExperimentList:

head(zhangneurons[[1]]@contrast_stats[[1]]$pvals, n = 10)

Again, we'll just copy those data from zhangneurons for demonstration purposes:

myesel[[1]]@contrast_stats <- zhangneurons[[1]]@contrast_stats

Now the RNA-seq app is more or less complete, and you should see volcano plots under 'Differential':

app <- prepareApp("rnaseq", myesel)
shiny::shinyApp(app$ui, app$server)

Gene sets

Many displays are more useful if they can be limited to biologically meaningful sets of genes. The gene_sets slot is designed to allow that. Gene sets are stored as lists of character vectors of gene identifiers, each list keyed by the name of the metadata column to which they pertain.

Adding gene sets to enable gene set filtering

The constructor for ExploratorySummarizedExperimentList assumes that gene sets are represented by the ID type specified in the gene_set_id_type_slot, and that they are specified as a list of GeneSetCollections. You might generate such a list as follows:

genesets_files = list(
  'KEGG' =  "/path/to/MSigDB/c2.cp.kegg.v5.0.entrez.gmt",
  'MSigDB canonical pathway' = "/path/to/MSigDB/c2.cp.v5.0.entrez.gmt",
  'GO biological process' = "/path/to/MSigDB/c5.bp.v5.0.entrez.gmt",
  'GO cellular component' = "/path/to/MSigDB/c5.cc.v5.0.entrez.gmt",
  'GO molecular function' = "/path/to/MSigDB/c5.mf.v5.0.entrez.gmt",
  'MSigDB hallmark'= "/path/to/MSigDB/h.all.v5.0.entrez.gmt"
)

gene_sets <- lapply(genesets_files, GSEABase::getGmt)

Then provide them during object creation:

myesel <- ExploratorySummarizedExperimentList(
  eses = list(expression = myese),
  title = "My title",
  author = "My Authors",
  description = 'Look what I gone done',
  gene_sets = gene_sets
)

These are then converted internally to a list of lists of character vectors of gene IDs. The top level is keyed by the type of gene ID to be used for labelling (stored in labelfield' onExploratorySummarisedExperiments`, the next level by the type of gene set.

For the zhangneurons example, gene sets are stored by gene_name:

names(zhangneurons@gene_sets)

4 types of gene set are used. For example, GO Biological Processes (GOBP):

names(zhangneurons@gene_sets$gene_name$GOBP)[1:10]

We can find the list of GO lactate transport genes, keyed by gene symbol:

zhangneurons@gene_sets$gene_name$GOBP$GO_LACTATE_TRANSPORT

Of course if you want to avoid the constructor, you can replicate that data structure and set the @gene_sets directly.

Gene set analysis

Gene set analyses can be stored as a list of tables in the @gene_set_analyses slot of an ExploratorySummarizedExperiment, supplied via the gene_set_analyses argument to its constructor. The list is keyed at three levels representing the assay, the gene set type and contrast involved. Illustrated with zhangneurons again:

names(zhangneurons$gene@gene_set_analyses)
names(zhangneurons$gene@gene_set_analyses$`Filtered normalised`)
names(zhangneurons$gene@gene_set_analyses$`Filtered normalised`$GOBP)
head(zhangneurons$gene@gene_set_analyses$`Filtered normalised`$GOBP$`MO-no-yes`)

This data struture is a bit cumbersome, and I'm thinking of ways of better representing such data and the associated contrasts.

Other options

Further options are available - for example supplying url_roots in the ExploratorySummarizedExperimentList will add link-outs where appropriate, and the description slot is handy for providing details of analysis to the user.

Included modules

shinyngs is build on a number of components built using Shiny's module framework, many of which are used multiple times in complex applications such as the one described above for RNA-seq.

Included modules are currently:

So for example heatmap uses selectmatrix to provide the UI controls to subselect the supplied matrices as well as the code which reads the output of those controls to actually derive the subsetted matrix. Shiny modules make this recycling of code much, much simpler than it would be otherwise.

Many of these can be called individually, for example to make an app for dendrograms only:

app <- prepareApp('dendro', eselist)
shiny::shinyApp(ui = app$ui, server = app$server)

Technical information

For technical information on package layout and functions, consult the package documentation:

?shinyngs

Running on a shiny server

Just use the commands sets above with shinyApp() in a file called app.R in a directory of its own on your Shiny server. For example, If you're created an ExploratorySummarizedExperiment and saved it to a file called 'data.rds':

library(shinyngs)

mydata <- readRDS("data.rds")

app <- prepareApp("rnaseq", mydata)
shiny::shinyApp(app$ui, app$server)


pinin4fjords/shinyngs documentation built on Feb. 28, 2024, 10:19 a.m.