SBGNview: Data Analysis, Integration and Visualization on Massive SBGN Pathway Collections

library(knitr)

Overview

SBGNview is a tool set for pathway based data visalization, integration and analysis. SBGNview is similar and complementary to the widely used Pathview[@luo2013pathview], with the following key features:

Citation

Please cite the following papers when using this open-source package. This will help the project and our team:

Luo W, Brouwer C. Pathview: an R/Biocondutor package for pathway-based data integration and visualization. Bioinformatics, 2013, 29(14):1830-1831, doi: 10.1093/bioinformatics/btt285

Installation and quick start

Please see the Quick Start tutorial for installation instructions and quick start examples.

Getting started

SBGNview is the main function of the package. It takes two major types of inputs: user data (gene and/or compound data) and SBGN pathway data (pathway IDs or SBGN-ML files). SBGNview parses the SBGN pathway data, extracts glyph (node) and arc (edge) data from SBGN-ML file, maps and integrates the user data to the glyphs and renders the SBGN graph in SVG formatwith mapped data as pseudo-colors. Currently it maps gene/protein omics data to "macromolecule" glyphs and maps compound omics data to "simple chemical" glyphs. The SBGNview function returns a SBGNview object, which contains data to recreate the rendered SBGN graph or further modified it. Please see the documentation of function SBGNview for more details.

Load the SBGNview package, together with pathview and other dependencies

library(SBGNview)

Accessing help and documentation

Use ls("package:SBGNview") to list all available functions in SBGNview. To learn more about how to use the functions and access their manuals, use help() or ?. For example, ?”SBGNview” or help("+.SBGNview").

ls("package:SBGNview")

SBGN pathway file (SBGN-ML)

SBGN pathways are defined in a special XML format (SBGN-ML file). SBGN-ML files describe the pathway elements (molecules and their reactions) and the graph layout (positions and appearances of the pathway elements). There are two major data types ement in SBGN-ML files: 1. node data (in tag "glyph"), such as node location, width, hight and node class (macromolecule, simple chemical etc.). 2. edge data (in tag "arc"), such as arc class, start node and end node. For more details, see: https://github.com/sbgn/sbgn/wiki/SBGN_ML

Two tiers of support and two scenarios with SBGN based pathways

SBGNview provides two tiers of support for SBGN based pathway data, which corresponds to the two common SBGNview scenarios:

Note the core collection of pathway databases with tier 1 support includes thousands of pathways and species, which should cover most of the users needs in pathway analysis and data visualization. We are taking the first the default route, i.e. to use the SBGNview's core collection.

Load SBGNview's SBGN-ML pathway data collection

Because a large amount of pathway data is loaded into R, this step can take up to several second.

data("sbgn.xmls")

Information about the SBGN-ML pathway data collection

We can check the information of collected pathways using pathways.info and number of pathways collected using pathways.stats.

data("pathways.info", "pathways.stats")
head(pathways.info)
pathways.stats

Search for pathways by keywords {#searchPathways}

SBGNview can search for pathways by keyword and automatically download SBGN-ML files.

pathways <- findPathways(c("bile acid","bile salt"))
head(pathways)
pathways.local.file <- downloadSbgnFile(pathways$pathway.id[1:3])
pathways.local.file

By default findPathways searches for keywords in pathway names. It can also search by different ID types

pathways <- findPathways(c("tp53","Trp53"),keyword.type = "SYMBOL")
head(pathways)
pathways <- findPathways(c("K04451","K10136"),keyword.type = "KO")
head(pathways)

Different layout for the same pathway

Researchers may have different tastes for a "good looking" layout. We have created different layouts for each pathway. User can download them from pre-generated SBGN-ML files and try.

Custom SBGN-ML files

USers can also define their own pathways and create custom SBGN-ML files from scratch. Several tools like Newt editor can let the user draw a pathway diagram and save it as SBGN-ML file. The tools may also able to generate a primitive pathway layout. But these layouts are usually not desirable for data visualization with too many node-node overlaps and edge-node crossings. Therefore, we recommend the users to stick to SBGNview core collection.

Basic visualization

SBGNview can visualize a range of omics data, including both gene (or transcript, protein, enzyme) data and compound (or metabolite, chemical, small molecules) data.

Visualize gene data

Gene/protein related data will be mapped to "macromolecule" nodes or glyphs on SBGN maps. Most of online databases or resources have their unique glyph ID types and it is likely different from the ID type in the gene data. We often need to map the omics IDs to the node IDs in the SBGN-ML file. In the quick start example ("Adrenaline and noradrenaline biosynthesis" pathway), the SBGN-ML file uses pathwayCommons IDs for gene/protein nodes, whereas the omics dataset uses Entrez gene IDs. The function SBGNview can automatically map common ID types such as ENTREZ, UniProt etc. to nodes in our pre-generated SBGN-ML files as shown in the quick start example. We can also do it manually using function changeDataId, which is called by SBGNview internally to do ID mapping. Supported ID type pairs can be found in data(mapped.ids). changeDataId uses pre-generated mapping tables or pathview to do the mapping. If the input-output ID type pair is not in data(mapped.ids) or can't be mapped by SBGNview automatically, user needs to provide the mapping table explicitly using the "id.mapping.table" argument.

Firt, load the preprocessed demo gene data.

data("gse16873.d")
head(gse16873.d)

Let's change the IDs in the gene data manually as an demo.

# select a subset of data for later use
gse16873 <- gse16873.d[,1:3]
# gene data to demonstrate ID convertion
gene.data <- gse16873
head(gene.data[,1:2])
gene.data <- changeDataId(data.input.id = gene.data,
                          input.type = "entrez",
                          output.type = "pathwayCommons",
                          mol.type = "gene",
                          sum.method = "sum")

When multiple genes are mapped to the same glyph ID, we use the sum of their values (sum.method = "sum").

head(gene.data[,1:2])

Now we run SBGNview, the main function to map and render gene data on SBGN map.

SBGNview.obj <- SBGNview(gene.data = gene.data,
                         input.sbgn = "P00001",
                         output.file = "test_output",
                         gene.id.type = "pathwayCommons",
                         output.formats =  c("png", "pdf", "ps"))
SBGNview.obj

SBGNview will always generate a .svg file. Other formats can be added also. In this example, three additional files (pdf, ps, png) will be created in the same folder.

```rVisualization of gene expression data."} include_graphics("test_output_P00001.svg")

## Visualize compound data
Chemical compound data will be mapped to "simple chemical" nodes on a SBGN map.

Here we simulate a compound dataset.
```r
cpd.data <- sim.mol.data(mol.type = "cpd", 
                         id.type = "KEGG COMPOUND accession", 
                         nmol = 50000, 
                         nexp = 2)
head(cpd.data)

Here for demo purpose, we change the KEGG compound IDs to pathwayCommons compound IDs. Although SBGNview can do this automatically, see [Visualize both gene data and compound data]{#genecpddata}.

cpd.data1 <- changeDataId(data.input.id = cpd.data,
                         input.type = "kegg",
                         output.type = "pathwayCommons",
                         mol.type = "cpd",
                         sum.method = "sum")
head(cpd.data1)

Now we run SBGNview, the main function to map and render compound data on SBGN map.

SBGNview.obj <- SBGNview(cpd.data = cpd.data1,
                         input.sbgn = "P00001",
                         output.file = "test_output.cpd",
                         cpd.id.type = "pathwayCommons",
                         output.formats =  c("png", "pdf", "ps"))
SBGNview.obj

```rVisualization of compound abundance data."} include_graphics("test_output.cpd_P00001.svg")

## Visualize both gene data and compound data {#genecpddata}

Now we can visualize both gene and compound data. In this example, we use the original gene expression data with "ENTREZ" IDs to show SBGNview's automatic ID mapping ability.
```r
SBGNview.obj <- SBGNview(gene.data = gse16873,
                         cpd.data = cpd.data,
                         input.sbgn = "P00001",
                         output.file = "test_output.gene.compound",
                         gene.id.type = "entrez",
                         cpd.id.type = "kegg",
                         output.formats =  c("svg"))
SBGNview.obj

```rVisualization of both gene expression and compound abundance data."} include_graphics("test_output.gene.compound_P00001.svg")

# Work with *SBGNview* object {#sbgnviewObj}
**SBGNview** operates in a way similar to **ggplot2**:
The main function *SBGNview* returns a S3 class *SBGNview* object (similar to the *ggplot* object "p" returned by function *ggplot* in **ggplot2**), which contains information needed to render SBGN graph, and output file name. Make sure the input SBGN-ML files are available in the folder. The `input.sbgn` parameters takes in names of local SBGN files or pathway IDs and `sbgn.dir` specifies the path to local files. Printing this object will render the graph and write output image files. *SBGNview* object can be further modified by several functions to highlight nodes/edges/paths (e.g. *SBGNview.obj* + *highlightNodes()*, similar to *p* + *geom_boxplot()* in **ggplot2**).

* These operations will generate image files
    + The following command `SBGNview(...) + highlightNodes(...)` or `SBGNview.obj + highlightNodes(...)` will return a *SBGNview* object to R console. The returned object is executed as a top-level R expression, thus will be implicitly printed using the `print.SBGNview` function in SBGNview package. For more details, please see the documentation for `print.SBGNview` function.

    + Running `SBGNview.obj` in the R console will implicitly print the object while `print(SBGNview.obj)` will explicitly print the object using `print.SBGNview` function.

* These commands will NOT generate plot files:
    + `SBGNview.obj = SBGNview(...) + highlightNodes(...)`. The assign operation "=" will make the returned object invisible therefore it will not be printed.

## Structure of *SBGNview* object
The *SBGNview* object is a S3 object defined as a list structure containing 'data' and 'output.file' and 'output.formats' elements. The object is created by the SBGNview S3 class defined within the package. The 'data' element of the object is a list containing information for all input SBGN pathways. Each element in 'data' is a list holding information of a single SBGN pathway. The information includes parsed glyphs, parsed arcs and other parameters to control graph features passed into *SBGNview* function. The 'output.file' element of the object is a string specified for the 'output.file' parameter in *SBGNview* function. The output file name can be modified using *outputFile*. The output formats can be updated by assigning a character vector of supported file formats by *SBGNview* function.  The code chucks below demonstrate how you can access glyphs and arcs information of a *SBGNview* object as well as changing output file name and output formats. 

## Extract glyphs and arcs infromation from *SBGNview* object
```r
pathway.1.data <- SBGNview.obj$data[[1]]
names(pathway.1.data)
glyphs <- pathway.1.data$glyphs.list
arcs <- pathway.1.data$arcs.list
str(glyphs[[1]])
str(arcs[[1]])
global.pars <- pathway.1.data$global.parameters.list
names(global.pars)

Modify SBGNview object directly

For each pathway in SBGNview object, the 'global.parameters.list' element seen above contains the default graphical parameters controlling the plotting of allelements (glyphs and arcs) of the pathway. We can modify these default parameters manually when needed. Most graphical parameters or features of individual glyphs or arcs can be controlled manually by directly modifying the SBGNview object too. For instance, we can modify different slots of glyphs[[1]] or arcs[[1]] in the code chunk above. Please make sure to read the documentation of SBGNview S3 class in the Value section fo SBGNview function.

Change output file name and format of SBGNview objects

We can change the name of the output file using built-in function outputFile:

outputFile(SBGNview.obj)
outputFile(SBGNview.obj) <- "test.change.output.file"
outputFile(SBGNview.obj)
SBGNview.obj
outputFile(SBGNview.obj) <- "test.print"
outputFile(SBGNview.obj)
print(SBGNview.obj)

We can update or change the output file formats of a SBGNview object. Supported output file formats are any combination of "png", "pdf", and "ps". If an empty vector of output file formats are provided, only the SVG file format will be written to current working directory.

print(SBGNview.obj$output.formats)
SBGNview.obj$output.formats <- c("png", "pdf")

Updating & modifying graphs

It is useful to highlight interesting nodes, edges or paths in a pathway map. One way is to modify the SBGNview object directly as we've seen above. The more systematic way is to use the build-in highlight functions. In other words, the SBGNview object can be further modified by concatenating it with modification or highlight functions using binary operator + (see quick start for example).

Hightlight nodes

Highlight all nodes

# Here we set argument "node.set" to select all nodes to be highlighted
highlight.all.nodes.sbgn.obj <-  SBGNview.obj + highlightNodes(node.set = "all",
                                                               stroke.width = 4, 
                                                               stroke.color = "green")
outputFile(highlight.all.nodes.sbgn.obj) <- "highlight.all.nodes"
print(highlight.all.nodes.sbgn.obj)

```rHighlight all nodes."} include_graphics("highlight.all.nodes_P00001.svg")

### Highlight nodes by class
```r
# Here we set argument "select.glyph.class" to select and highlight macromolecule nodes
highlight.macromolecule.sbgn.obj <-  SBGNview.obj + 
                                        highlightNodes(select.glyph.class = "macromolecule",
                                                       stroke.width = 4, 
                                                       stroke.color = "green")
outputFile(highlight.macromolecule.sbgn.obj) <- "highlight.macromolecule"
print(highlight.macromolecule.sbgn.obj)

```rHighlight macromolecule nodes."} include_graphics("highlight.macromolecule_P00001.svg")

### Show node IDs instead of node labels
```r
# Here we set argument "show.glyph.id" to display node ID instead of the original label
highlight.all.nodes.sbgn.obj <-  SBGNview.obj + highlightNodes(node.set = "(+-)-epinephrine", 
                                                               stroke.width = 4, 
                                                               stroke.color = "green",
                                                               show.glyph.id = TRUE,
                                                               label.font.size = 10)
outputFile(highlight.all.nodes.sbgn.obj) = "highlight.all.id.nodes"
print(highlight.all.nodes.sbgn.obj)

```rHighlight nodes using node IDs."} include_graphics("highlight.all.id.nodes_P00001.svg")

## Adjust node labels.

### Label position, font size, color, change labels
The function *highlightNodes* also can be used to adjust labels. In this example, we move the label horizontally and vertically, change their color and font size.
```r
# Node labels can be customized by a named vector. The names of the vector are the IDs assigned to argument "node.set". Values of the vector are the new labels for display.  
my.labels <- c("Tyr","epinephrine")
names(my.labels) <- c("tyrosine", "(+-)-epinephrine")
SBGNview.obj.adjust.label <-  SBGNview.obj + 
                                  highlightNodes(node.set = c("tyrosine", "(+-)-epinephrine"),
                                                 stroke.width = 4,
                                                 stroke.color = "green",
                                                 label.x.shift = 0,
                                                 # Labels are moved up a little bit
                                                 label.y.shift = -20,
                                                 label.color = "red",
                                                 label.font.size = 30,
                                                 label.spliting.string = "",
                                                 labels = my.labels)
outputFile(SBGNview.obj.adjust.label) <- "adjust.label"
print(SBGNview.obj.adjust.label)

```rModify node labels."} include_graphics("adjust.label_P00001.svg")

### Label text wrapping into multiple lines
Some nodes may have long labels thus overlap with surrounding graph elements. In this case we can set the parameter *label.spliting.string* to "any" so the label will be wrapped in multiple lines that fit the width of the node.
```r
SBGNview.obj.change.label.wrapping <-  SBGNview.obj + 
                                        highlightNodes(node.set = c("tyrosine", "(+-)-epinephrine"),
                                                       stroke.width = 4,
                                                       stroke.color = "green",
                                                       show.glyph.id = TRUE,
                                                       label.x.shift = 10,
                                                       label.y.shift = 20,
                                                       label.color = "red",
                                                       label.font.size = 10,
                                                       label.spliting.string = "any")
outputFile(SBGNview.obj.change.label.wrapping) = "change.label.wrapping"
print(SBGNview.obj.change.label.wrapping)

```rChange how labels are wrapped."} include_graphics("change.label.wrapping_P00001.svg")

## When one input ID maps to multiple nodes {#findNode}
In the example above, we saw that one input ID (e.g. "(+-)-epinephrine") can be mapped to multiple nodes in the graph. If we just want to focus on several particular ones, we can use function *highlightNodes* to find the node IDs, which is unique to each node:
```r
# When "label.spliting.string" is set to a string that is not in the label (including an empty string ""), the label will not be wrapped into multiple lines.    
test.show.glyph.id <- SBGNview.obj + highlightNodes(node.set = c("tyrosine", "(+-)-epinephrine"), 
                                                    stroke.width = 4,
                                                    stroke.color = "green", 
                                                    show.glyph.id = TRUE,
                                                    label.x.shift = 10,
                                                    label.y.shift = 20,
                                                    label.color = "red",
                                                    label.font.size = 10,
                                                    label.spliting.string = "")
outputFile(test.show.glyph.id) <- "test.show.glyph.id"
print(test.show.glyph.id)

```rShow node IDs of mapped nodes."} include_graphics("test.show.glyph.id_P00001.svg")

We can find the mapping between input IDs and node IDs:
```r
input.pathways <- findPathways("Adrenaline and noradrenaline biosynthesis")
input.pathway.ids <- input.pathways$pathway.id
mapping <- changeIds(input.ids = c("tyrosine", "(+-)-epinephrine"),
                     input.type = "compound.name",
                     output.type = "pathwayCommons",
                     mol.type = "cpd",
                     limit.to.pathways = input.pathway.ids[1])
mapping

Highlight arcs and paths

In addition to nodes and their attributes, we may also highlight arcs and paths and modify their attributes. The Quick Start vignette has one example on highlighting nodes, arcs and paths together.

Here, we can pick two nodes to highlight and find a shortest path between them.

outputFile(SBGNview.obj) <- "highlight.by.node.id"
SBGNview.obj + highlightNodes(node.set = c("tyrosine", "(+-)-epinephrine"),
                              stroke.width = 4, 
                              stroke.color = "red") + 
               highlightPath(from.node = "SmallMolecule_96737c854fd379b17cb3b7715570b733",
                             to.node = "SmallMolecule_7753c3822ee83d806156d21648c931e6",
                             node.set.id.type = "pathwayCommons",
                             from.node.color = "green",
                             to.node.color = "blue",
                             shortest.paths.cols = c("purple"),
                             input.node.stroke.width = 6,
                             path.node.stroke.width = 3,
                             path.node.color = "purple",
                             path.stroke.width = 5,
                             tip.size = 10)

```rHighlight nodes and shortest path using node IDs."} include_graphics("highlight.by.node.id_P00001.svg")

## Different layouts for the same pathway{#tryDifferentLayout}
If the default layout is not ideal, users have two options:

* Modify the layout manually using tools like [newt editor](http://newteditor.org/)

* Download our [pre-generated SBGN-ML files](https://github.com/datapplab/SBGNhub/tree/master/data/SBGN.with.stamp) SBGN-ML files with a different layout. Currently there limited pathways with multiple layout. More will be added in the future.

A new layout for P00001 ("Adrenaline and noradrenaline biosynthesis") pathway ID is included as part of the *SBGNview* package. 
```r
new.layout.file.path <- paste(path.package("SBGNview"), "extdata", sep = "/")
SBGNview(gene.data = gse16873, 
         gene.id.type = "entrez",
         input.sbgn = "P00001.new.layout.sbgn",
         sbgn.dir =  new.layout.file.path,
         sbgn.gene.id.type = "pathwayCommons",
         output.file = "test.different.layout",
         output.formats =  c("svg"))

```rGraph with different layout."} include_graphics("test.different.layout_P00001.new.layout.sbgn.svg")

## SVG Editing Tool
SBGNview provides extensive graphical control options for adjusting glyph/arc attributes such as line color/width and text size/color/wrapping/positioning. The SBGNview main function outputs SVG file format by default and other (png, pdf, ps) specified file formats. There might be cases where you would like to adjust specific graphical details more easily. In such cases, [Inkscape](https://inkscape.org/) can be used to modify the generated SVG file and save/export the SVG to other file formats provided by Inkscape.

# ID mapping
SBGNview can automatically map common ID types to SBGN-ML glyphs in our [SBGNview pathway collection](https://github.com/datapplab/SBGNhub/tree/master/data/SBGN.with.stamp).  Supported ID types can be accessed as follow:
```r
data("mapped.ids")

Map between two types of IDs

Besides changeDataId which changes ID for omics data, SBGNview provides functions to map between different types IDs:

mapping <- changeIds(input.ids = c("tyrosine", "(+-)-epinephrine"),
                     input.type = "compound.name",
                     output.type = "pathwayCommons",
                     mol.type = "cpd",
                     limit.to.pathways = "P00001")
head(mapping)
mapping <- changeIds(input.ids = c("tyrosine", "(+-)-epinephrine"),
                     input.type = "compound.name",
                     output.type = "chebi",
                     mol.type = "cpd")
head(mapping)

Extract molecule list from pathways {#extractList}

mol.list <- sbgn.gsets(database = "metacrop",
                       id.type = "ENZYME",
                       species = "ath",
                       output.pathway.name = FALSE,
                       truncate.name.length = 50)
mol.list[[1]]
mol.list <- sbgn.gsets(database = "pathwayCommons",
                       id.type = "ENTREZID",
                       species = "hsa")
mol.list[[1]]
mol.list <- sbgn.gsets(database = "pathwayCommons",
                       id.type = "ENTREZID",
                       species = "mmu")
mol.list[[1]]
mol.list <- sbgn.gsets(database = "MetaCyc",
                       id.type = "KO",
                       species = "eco")
mol.list[[3]]
mol.list <- sbgn.gsets(database = "pathwayCommons",
                       id.type = "chebi",
                       mol.type = "cpd")
mol.list[[2]]

Example using selected database

Use Reactome pathway database

is.reactome <- pathways.info[,"sub.database"]== "reactome"
reactome.ids <- pathways.info[is.reactome ,"pathway.id"]
SBGNview.obj <- SBGNview(gene.data = gse16873,
                         gene.id.type = "entrez",
                         input.sbgn =  reactome.ids[1:2],
                         output.file = "demo.reactome",
                         output.formats =  c("svg"))
SBGNview.obj

Test SBGN reference cards

reference.cards <- c("AF_Reference_Card.sbgn", "PD_Reference_Card.sbgn", "ER_Reference_Card.sbgn") 
downloadSbgnFile(reference.cards)
SBGNview.obj <- SBGNview(gene.data = c(),
                         input.sbgn = reference.cards,
                         sbgn.gene.id.type ="glyph",
                         output.file = "./test.refcards",
                         output.formats = c("pdf"),
                         font.size = 1,
                         logic.node.font.scale = 10,
                         status.node.font.scale = 10)
SBGNview.obj

FAQs

Color key

Turn off color key

SBGNview(key.pos = "none")

Session Info

sessionInfo()

References



Try the SBGNview package in your browser

Any scripts or data that you put into this service are public.

SBGNview documentation built on March 3, 2021, 2 a.m.