r Githubpkg("pinin4fjords/shinyngs")
is a package designed to facilitate downstream analysis of RNA-seq and similar matrix data with various exploratory plots. It's a work in progress, with new features added on a regular basis. Individual components (heatmaps, pca etc) can function independently and will be useful outside of the RNA-seq context.
It's not always trivial to quickly assess the results of next-generation sequencing experiment. r Githubpkg("pinin4fjords/shinyngs")
is designed to help fix that by providing a way of instantly producing a visual tool for data mining at the end of an analysis pipeline.
SummarizedExperiment
format, called ExploratorySummarizedExperiment
ExploratorySummarizedExperiment
s is supplied (useful in situations where the features are different beween matrices - e.g. from transcript- and gene- level analyses), a selection field will be provided.shinyngs
relies heavily on SumamrizedExperiment
. Formerly found in the GenomicRanges
package, it now has its own package on Bioconductor: http://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html. This requires a recent version of R.
Graphical enhancements are provided by shinyBS
and shinyjs
library(devtools) install_github('pinin4fjords/shinyngs')
The data structures used by Shinyngs build on SummarizedExperiment
. One SummarizedExperiment
can have multiple 'assays', essentially matrices with samples in columns and 'features' (transcripts or genes) in rows, representing different results for the same features and samples. This is handy to compare results before and after processing, for example. ExploratorySummarizedExperiment
extends SummarizedExperiment
to include slots relating to annotation, and associated results of 'tests', providing p values and q values.
ExploratorySummarizedExperimentList
is a container for one or more ExploratorySummarizedExperiment
objects, and is intented to describe an overall study, i.e. one or more experiments the same set of samples, but different sets of features in each experiment. The ExploratorySummarizedExperimentListList
therefore is used to supply study-wide things such as contrasts, gene sets, url roots for creating links etc.
To see how to quickly build an RNA-seq app from a simple SummarizedExperiment, we can use the example data in the airway
package. We just convert the RangedSummarizedExperiment to an ExploratorySummarizedExperiment, and add it to a list of such objects, which represent a study.
library(shinyngs) data(airway, package = 'airway') ese <- as(airway, 'ExploratorySummarizedExperiment') eselist <- ExploratorySummarizedExperimentList(ese)
Then we build and run the app. For example, a basic app just for heatmaps:
app <- prepareApp('heatmap', eselist) shiny::shinyApp(ui = app$ui, server = app$server)
Note the use of prepareApp
to generate the proper ui
and server
, which are then passed to Shiny.
We can build a more comprehensive app with multiple panels aimed at RNA-seq:
app <- prepareApp('rnaseq', eselist) shiny::shinyApp(ui = app$ui, server = app$server)
Airway provides some info about the dataset, which we can add in to the object before we build the app:
data(airway, package = 'airway') expinfo <- metadata(airway)[[1]] eselist <- ExploratorySummarizedExperimentList( ese, title = expinfo@title, author = expinfo@name, description = abstract(expinfo) ) app <- prepareApp('rnaseq', eselist) shiny::shinyApp(ui = app$ui, server = app$server)
All this app knows about is gene IDs, however, which aren't all that informative for gene expression plots etc. We can add row metadata to fix that:
# Use Biomart to retrieve some annotation, and add it to the object library(biomaRt) attributes <- c( 'ensembl_gene_id', # The sort of ID your results are keyed by 'entrezgene', # Will be used mostly for gene set based stuff 'external_gene_name' # Used to annotate gene names on the plot ) mart <- useMart(biomart = 'ENSEMBL_MART_ENSEMBL', dataset = 'hsapiens_gene_ensembl', host='www.ensembl.org') annotation <- getBM(attributes = attributes, mart = mart) annotation <- annotation[order(annotation$entrezgene),] mcols(ese) <- annotation[match(rownames(ese), annotation$ensembl_gene_id),] # Tell shinyngs what the ids are, and what field to use as a label ese@idfield <- 'ensembl_gene_id' ese@labelfield <- 'external_gene_name' # Re-build the app eselist <- ExploratorySummarizedExperimentList( ese, title = expinfo@title, author = expinfo@name, description = abstract(expinfo) ) app <- prepareApp('rnaseq', eselist) shiny::shinyApp(ui = app$ui, server = app$server)
zhangneurons
Example datasetairway
is fine, but it contains no information on differential expression. shinyngs
provides extra slots for differential analyses, among other things.
An example ExploratorySummarizedExperimentList
based on the Zhang et al study of neurons and glia (http://www.jneurosci.org/content/34/36/11929.long) is included in the zhangneurons
package, and this can be used to demonstrate available features. The dataset includes transcript- and gene- level quantification estimates (as ExporatorySummarizedExperiment
s within an ExploratorySummarizedExperimentList
, and three levels of processing (raw, filtered, normalised) in the assays
slots of each.
Note: this data was generated using Salmon (https://combine-lab.github.io/salmon/) for quantification, and results may therefore be slightly different to the authors' online tool (which did not use Salmon).
Install the data package:
library(devtools) install_github('pinin4fjords/zhangneurons')
... and load the data.
library(shinyngs) data("zhangneurons")
The data can then be used to build an application:
app <- prepareApp("rnaseq", zhangneurons) shiny::shinyApp(app$ui, app$server)
This example generates the full application designed for RNA-seq analysis. Remember that individual components can be created too:
app <- prepareApp("heatmap", zhangneurons) shiny::shinyApp(app$ui, app$server)
An alternative and simple way to create an application is to describe your experiment using a YAML file, and pass the YAML file to Shinyngs. This has advantages where a pipeline produces many outputs outside of R which then have to be read and compiled.
The eselistFromYAML()
function is provided to help construct an ExploratorySummarizedExperiment object. You might make a file like:
title: My RNA seq experiment author: Joe Blogs report: report.md group_vars: - Group - Replicate default_groupvar: Group experiments: Gene: coldata: file: my.experiment.csv id: External annotation: file: my.annotation.csv id: gene_id entrez: ~ label: gene_id expression_matrices: Raw: file: raw_counts.csv measure: counts Filtered: file: filtered_counts.csv measure: Counts per million Normalised: file: normalised_counts.csv measure: Counts per million read_reports: read_attrition: read_attrition.csv contrasts: comparisons: 0: - Group control TreatmentA 1: - Group control TreatmentB stats: Gene: Normalised: pvals: pvals.csv qvals: qvals.csv
You can then generate the object with a command like eselist <- eselistFromYAML('my.yaml')
. This is how the zhangneurons
dataset was generated- see vignette(zhangneurons)
for details, and for the component input files themselves.
To demonstrate this, let's break down zhangneurons
into simple datatypes and put it back together again.
# Assays is a list of matrices library(zhangneurons) data(zhangneurons, envir = environment()) myassays <- as.list(SummarizedExperiment::assays(zhangneurons[[1]])) head(myassays[[1]])
colData is your sample information defining groups etc
mycoldata <- data.frame(SummarizedExperiment::colData(zhangneurons[[1]])) head(mycoldata)
Annotation is important to `shinyngs'. You need a data frame with rows corresonding to those in the assays
myannotation <- SummarizedExperiment::mcols(zhangneurons[[1]]) head(myannotation)
ExploratorySummarizedExperiment
Now we can put these things together to create an 'ExploratorySummarizedExperiment:
myese <- ExploratorySummarizedExperiment( assays = SimpleList( myassays ), colData = DataFrame(mycoldata), annotation <- myannotation, idfield = 'gene_id', labelfield = "gene_name" ) print(myese)
Note the extra fields that mostly tell shinyngs
about annotation to help with labelling etc.
ExploratorySummarizedExperimentList
ExploratorySummarizedExperimentList
s are basically a list of ExploratorySummarizedExperiment
s, with additional metadata slots.
myesel <- ExploratorySummarizedExperimentList( eses = list(expression = myese), title = "My title", author = "My Authors", description = 'Look what I gone done' )
You can use this object to make an app straight away:
app <- prepareApp("rnaseq", myesel) shiny::shinyApp(app$ui, app$server)
... but it's of limited usefulness because the sample groupings are not highlighted. We need to specify group_vars
for that to happen, picking column names from the colData
:
myesel@group_vars <- c('Group', 'Tissue')
.. then if we re-make the app you should see group highlighting.
app <- prepareApp("rnaseq", myesel) shiny::shinyApp(app$ui, app$server)
... for example, in the PCA plot
But where are the extra plots for looking at differential expression? For those, we need to supply contrasts. Contrasts are supplied as a list of character vectors describing the variable in colData
upon the contrast is based, and the two values of that variable to use in the comparison. We'll just copy the one over from the original zhangneurons
:
zhangneurons@contrasts myesel@contrasts <- zhangneurons@contrasts
Run the app again and you should see tables of differential expression, and scatter plots between pairs of conditions.
app <- prepareApp("rnaseq", myesel) shiny::shinyApp(app$ui, app$server)
But without information on the significance of the fold changes, we can't make things like volcano plots. For those we need to populate the contrast_stats
slot. contrast_stats is a list of lists of matrices in the ExploratorySummarizedExperiment
objects, with list names matching one or more of the names in assays
, second-level names being 'pvals' and 'qvals' and the columns of each matrix corresponding the the contrasts
slot of the containing ExploratorySummarizedExperimentList
:
head(zhangneurons[[1]]@contrast_stats[[1]]$pvals, n = 10)
Again, we'll just copy those data from zhangneurons
for demonstration purposes:
myesel[[1]]@contrast_stats <- zhangneurons[[1]]@contrast_stats
Now the RNA-seq app is more or less complete, and you should see volcano plots under 'Differential':
app <- prepareApp("rnaseq", myesel) shiny::shinyApp(app$ui, app$server)
Many displays are more useful if they can be limited to biologically meaningful sets of genes. The gene_sets
slot is designed to allow that. Gene sets are stored as lists of character vectors of gene identifiers, each list keyed by the name of the metadata column to which they pertain.
The constructor for ExploratorySummarizedExperimentList
assumes that gene sets are represented by the ID type specified in the gene_set_id_type_slot
, and that they are specified as a list of GeneSetCollection
s. You might generate such a list as follows:
genesets_files = list( 'KEGG' = "/path/to/MSigDB/c2.cp.kegg.v5.0.entrez.gmt", 'MSigDB canonical pathway' = "/path/to/MSigDB/c2.cp.v5.0.entrez.gmt", 'GO biological process' = "/path/to/MSigDB/c5.bp.v5.0.entrez.gmt", 'GO cellular component' = "/path/to/MSigDB/c5.cc.v5.0.entrez.gmt", 'GO molecular function' = "/path/to/MSigDB/c5.mf.v5.0.entrez.gmt", 'MSigDB hallmark'= "/path/to/MSigDB/h.all.v5.0.entrez.gmt" ) gene_sets <- lapply(genesets_files, GSEABase::getGmt)
Then provide them during object creation:
myesel <- ExploratorySummarizedExperimentList( eses = list(expression = myese), title = "My title", author = "My Authors", description = 'Look what I gone done', gene_sets = gene_sets )
These are then converted internally to a list of lists of character vectors of gene IDs. The top level is keyed by the type of gene ID to be used for labelling (stored in labelfield' on
ExploratorySummarisedExperiments`, the next level by the type of gene set.
For the zhangneurons
example, gene sets are stored by gene_name
:
names(zhangneurons@gene_sets)
4 types of gene set are used. For example, GO Biological Processes (GOBP):
names(zhangneurons@gene_sets$gene_name$GOBP)[1:10]
We can find the list of GO lactate transport genes, keyed by gene symbol:
zhangneurons@gene_sets$gene_name$GOBP$GO_LACTATE_TRANSPORT
Of course if you want to avoid the constructor, you can replicate that data structure and set the @gene_sets
directly.
Gene set analyses can be stored as a list of tables in the @gene_set_analyses
slot of an ExploratorySummarizedExperiment
, supplied via the gene_set_analyses
argument to its constructor. The list is keyed at three levels representing the assay, the gene set type and contrast involved. Illustrated with zhangneurons
again:
names(zhangneurons$gene@gene_set_analyses) names(zhangneurons$gene@gene_set_analyses$`Filtered normalised`) names(zhangneurons$gene@gene_set_analyses$`Filtered normalised`$GOBP) head(zhangneurons$gene@gene_set_analyses$`Filtered normalised`$GOBP$`MO-no-yes`)
This data struture is a bit cumbersome, and I'm thinking of ways of better representing such data and the associated contrasts.
Further options are available - for example supplying url_roots
in the ExploratorySummarizedExperimentList
will add link-outs where appropriate, and the description
slot is handy for providing details of analysis to the user.
shinyngs
is build on a number of components built using Shiny's module framework, many of which are used multiple times in complex applications such as the one described above for RNA-seq.
Included modules are currently:
heatmap
- provides controls and a display for making heat maps based on user criteria.pca
- provides controls and display for an interactive PCA plot.boxplot
- provides controls and display for an interactive boxplot.dendro
- a clustering of samples in dendrogram plotted with ggdendro
}.gene
- a bar plot showing gene expression and a table with fold changes etc (where appropriate)simpletable
- a simple display using datatables (via the DT
package) to show a table and a download button. More complex table displays (with further controls, for example) can build on this module.assaydatatable
- shows the assaydata()
content of the selected experiment.selectmatrix
- provides controls and output for subsetting the profided assay data prior to plotting. Called by many of the plotting modules.sampleselect
- provides a UI element for selecting the columns of the matrix based on sample name or group. Called by the selectmatrix
module.geneselect
- provides a UI element for selecing the rows of a matrix based on criteria such as variance. Called by the selectmatrix
module.genesets
- provides UI element for selecting gene sets. Called by the geneselect
module when a user chooses to filter by gene set. plotdownload
- provides download button to non-Plotly plots (Plotly-driven plots have their own export button)So for example heatmap
uses selectmatrix
to provide the UI controls to subselect the supplied matrices as well as the code which reads the output of those controls to actually derive the subsetted matrix. Shiny modules make this recycling of code much, much simpler than it would be otherwise.
Many of these can be called individually, for example to make an app for dendrograms only:
app <- prepareApp('dendro', eselist) shiny::shinyApp(ui = app$ui, server = app$server)
For technical information on package layout and functions, consult the package documentation:
?shinyngs
Just use the commands sets above with shinyApp()
in a file called app.R in a directory of its own on your Shiny server. For example, If you're created an ExploratorySummarizedExperiment
and saved it to a file called 'data.rds':
library(shinyngs) mydata <- readRDS("data.rds") app <- prepareApp("rnaseq", mydata) shiny::shinyApp(app$ui, app$server)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.