SBGNview: Overlay omics data on SBGN pathway diagram and output image...

Description Usage Arguments Details Value Examples

Description

This is the main function to map, integrate and render omics data on pathway graphs. Two inputs are needed: 1. A pathway file in SBGN-ML format and 2. gene and/or compound omics data. The function generates image file of a pathway graph with the omics data mapped to the glyphs and rendered as pseudo-colors. It is similar to pathview except the pathways are rendered with SBGN notation. In addition, users can control more graph properties including node/edge attributes. We collected SBGN-ML files from several pathway databases: Reactome, MetaCyc, MetaCrop, PANTHER and SMPDB. Given a vector of patway IDs, SBGNview can automatically download and use these SBGN-ML files. To map omics data to glyphs, user just needs to specify omics data ID types. When using user customized SBGN-ML files, users need to provide a mapping file from omics data's molecule IDs to SBGN-ML file's glyph IDs.

Usage

1
2
3
4
5
6
7
8
SBGNview(gene.data = NULL, cpd.data = NULL, simulate.data = FALSE,
  input.sbgn = NULL, sbgn.dir = ".", output.file = "./output.svg",
  node.sum = "sum", gene.id.type = NA, cpd.id.type = NA,
  sbgn.id.attr = "id", sbgn.gene.id.type = NULL,
  sbgn.cpd.id.type = NA, id.mapping.gene = NULL,
  id.mapping.cpd = NULL, org = "hsa", output.formats = c("svg"),
  show.pathway.name = FALSE,
  SBGNview.data.folder = "./SBGNview.tmp.data", ...)

Arguments

gene.data

A matrix, vector or SummarizedExperiment object. The same as "gene.data" argument in package "pathview", it is either a vector (single measurement) or a matrix-like data (multiple measurements). If the data is a vector, the entries should be numeric and names of entries should be gene IDs. Matrix data structure has genes as rows and samples as columns. Row names should be gene IDs. Here gene ID is a generic concept, including multiple types: gene, transcript or protein. Default gene.data=NULL.

cpd.data

A matrix, vector or SummarizedExperiment object. The same as "gene.data", excpet named with compound IDs. Default cpd.data=NULL.

simulate.data

Logical. SBGNview can simulate a dataset. If set to true, SBGNview will simulate a gene data set and a compound dataset and user input "gene.data" and "cpd.data" are ignored.

input.sbgn

A vector. Can be either names of local SBGN files or pathway IDs of our pre-collected pathways.

sbgn.dir

A character string. The path to the folder that holds SBGN-ML files. If "input.sbgn" is a vector of pathway IDs in data "pathways.info", the SBGN-ML files will be downloaded into this folder.

output.file

A character string. Path to the output image files. Because we often work with multiple pathways, each pathway will have its own image files. Each string in "input.sbgn" will be added to the end of "output.file". Depending on the image format specified by the "output.formats" parameter, extentions will be added to the end (e.g. .pdf, .png etc.).

node.sum

A character string. Sometimes multiple omics genes/compounds are mapped to one SBGN glyph. Therefore multiple values will be mapped to one measurement/slice on the glyph. In this situation ,we may need to derive a single value for the slice on the glyph. This function can be any R function that takes a numeric vector as input and output a single numeric value (e.g. "sum","max","min","mean"). It can also be a User Defined Function (UDF).

gene.id.type

A character string. The type of gene ID in "gene.data". This parameter is used for ID mapping. It should be one of the IDs in data "mapped.ids". For details, run: data("mapped.ids")

cpd.id.type

A character string. The type of compound ID in "cpd.data". For details, run: data("mapped.ids")

sbgn.id.attr

A character string. This tells SBGNview where to find the ID of a glyph in SBGN-ML file for ID mapping. This ID is used to map omics data to the glyph. It is normally the name of an attribute in the "glyph" element . For example : <glyph class="macromolecule" id="p53"> </glyph>. We can specify: sbgn.id.attr = "id"; sbgn.gene.id.type = "SYMBOL". For our pre-generated SBGN-ML files, the ID attribute will be determined automatically thus can be omitted.

Accepted values:

1. Any attribute name in element "glyph" For example : <glyph class="macromolecule" id="p53" protein="P04637"> </glyph>. We can specify: sbgn.id.attr = "protein"; sbgn.gene.id.type = "UNIPROT", then "P04637" will be the glyph ID.

2. The string "label", this will make SBGNview use the glyph label as glyph ID. For example: <glyph id="glyph14" class="simple chemical"> <label text="L-alanine"/> </glyph>. We can specify: sbgn.id.attr = "label"; sbgn.gene.id.type = "CompoundName", then "L-alanine" will be used as glyph ID.

sbgn.gene.id.type

A character string. The type of gene ID in SBGN-ML file (recorded in the "id" attribute in XML tag "glyph" of "macromolecule"s. See parameter "sbgn.id.attr" for more details). This parameter is used for ID mapping,i.e. either use our pre-generated mapping tables or find corresponding columns in user defined mapping tables in "id.mapping.gene". For our pre-generated SBGN-ML files, this will be determined automatically according to the pathway IDs thus can be omitted. For user defined SBGN-ML file, this parameter should be one of the column names of the matrix "id.mapping.gene".

sbgn.cpd.id.type

A character string. See "sbgn.gene.id.type". The compond IDs are recorded in the "id" attribute in XML tag "glyph" of "simple chemical"s. See "sbgn.id.attr" for more details.

id.mapping.gene

A matrix. Mapping table between gene.id.type and sbgn.gene.id.type. This table is needed if the ID pair of gene.id.type and sbgn.gene.id.type is NOT included in data "mapped.ids" or not mappable by package "pathview". This matrix should have two columns for gene.id.type and sbgn.gene.id.type, respectively. Column names should be the values of parameters "sbgn.gene.id.type" and "gene.id.type". See example section for an example.

id.mapping.cpd

A matrix. See id.mapping.gene.

org

A character string. The species of the gene omics data. It is used for species specific gene ID mapping. Currently only supports three letters KEGG code (e.g. hsa, mmu, ath). For a complete list of KEGG codes, see this page:
https://www.genome.jp/kegg/catalog/org_list.html

output.formats

A vector. It specifies the formats of output image files. The vector should be a subset of c("pdf" , "ps", "png"). By default the function will always output a svg file. SBGNview uses rsvg to convert svg file to other formats. If other "output.formats" is set but "rsvg" package is not installed, an error will occur. See this page for how to install "rsvg":
https://github.com/jeroen/rsvg

show.pathway.name

Logical. If set to TRUE and "input.sbgn" are pre-collected pathway IDs, the pathway name will be added to the output file.

SBGNview.data.folder

A character string. The path to a folder that will hold download ID mapping files and pathway information data files. The data can be reused once downloaded.

...

Other parameters passed to function render.sbgn, including:

key.pos
-Default: "topright" One of "bottomleft" , "bottomright", "topright", "topleft" location of color panel: lower left, lower right, upper right, upper left

color.panel.scale
-Default: 1.5 It controls the relative size of color scheme panel.

color.panel.n.grid
-Default: 101 How many colors does the color scheme show.

col.gene.low
-Default: "green"

col.gene.high
-Default: "red"

col.gene.mid
-Default: "gray"

col.cpd.low
-Default: "blue"

col.cpd.high
-Default: "yellow"

col.cpd.mid
-Default: "gray"

min.gene.value
-Default: -1 Color panel's min value for gene. Values smaller than this will have the same color as min.value.

max.gene.value
-Default: 1

mid.gene.value
- Default: 0

min.cpd.value
-Default: -1 Color panel's min value for compound. Values smaller than this will have the same color as min.value.

max.cpd.value
-Default: 1

mid.cpd.value
-Default: 0

multimer.margin
-Default: 5 For multimers, they are represented by two partly overlapped shapes (rectangle or ellipse). This parameter controls how much the two shapes overlap.

compartment.opacity
-Default: 0.9 How transparent the compartments are.

auxiliary.opacity
-Default: 1 Opacity of auxiliary glyphs.

if.plot.annotation.nodes
-Default: F Some sbgn files have "annotation" glyphs. By default we don't plot them.

inhibition.edge.end.shift
-Default: 4 The tip of "inhibition" arcs is a line segment. Sometimes it overlaps with target glyph's border. We can shift it to prevent the overlap.

edge.tip.size
-Default: 4

if.use.number.for.long.label
-Default: T If the label is too long, we can create a shorter name for it. e.g. "macromolecule_1"

if.write.shorter.label.mapping
-Default: T If if.use.number.for.long.label is "T", we can write the mapping between shorter name and the original label to a text file.

label.spliting.string
-Default: c(" ","-",";","/","_") When we split text into multiple lines, these characters will be used to split label(where a new line can be created). If label.spliting.string is set to anything not in the label (e.g. "", NULL, NA etc.), the label won't be split (i.e. all text in the same line). If label.spliting.string is set to "any", the label will be at any character when the lenth of current line reaches the width determined by parameter "text.length.factor

complex.compartment.label.margin
-Default: 8 Move the label text vertically for compartment and complex.

font.size
-Default: 6 Affects font size of all types of glyphs.

font.size.scale.gene
-Default: 3 Only affect font size of "macromolecule" glyphs.

font.size.scale.cpd
-Default: 3 Only affects font size of "simple chemical" glyphs.

logic.node.font.scale
-Default: 6 Controls the size of logic glyphs(and, or, not etc.).

text.length.factor
-Default: 2 How wide the wrapped text should be. If text is longer than the width controled by this parameter, the text is split into a new line, but only at characters in "label.spliting.string".

node.width.adjust.factor
-Default: 3 Change font size according to the width of its glyph. If the glyph is too large (e.g. a compartment), its label may look too small. Then we can enlarge the label in proportion to width of the glyph. It affects all types of glyphs.

if.scale.complex.font.size
-Default: T If we want to scale complex font size according to its width.

if.scale.compartment.font.size
-Default: T If scale compartment font size according to its width.

node.width.adjust.factor.compartment
-Default: 1 How much the font size should change in proportion to the width of compartment.

node.width.adjust.factor.complex
-Default: 4 How much the font size should change in proportion to the width of complex.

Details

1. About SBGNview()

This function extracts glyph (node) and arc (edge) data from a SBGN-ML file and creates a SBGN graph from the extracted data (draws shapes etc. in SVG format). Then it maps omics data to the glyphs and renders data as colors. Currently it maps gene/protein omics data to "macromolecule" glyphs and maps compound omics data to "simple chemical" glyphs.

2. About SBGN-ML files and curved arcs encoding

SBGNview can parse SBGN-ML files with standard SBGN PD syntax. For arcs, SBGNview can handle both straight lines (standard SBGN PD syntax) and spline curves (a syntax add by us). Current SBGN-ML syntax supports straight lines. The coordinates of line start or end points are stored in element "arc". But straight lines often produce node-edge or edge-edge crossings. Therefore, we generated SBGN-ML files with pre-routed spline edges.

To store the routed splines, we added an XHTML element called "edge.spline.info", which has children elements called "arc.spline" . Each "arc.spline" element has three types of children elements: "source.arc", "target.arc" and "spline". "source.arc" and "target.arc" will be rendered as straight line arcs controlled by attributes "start.x","start.y", "end.x", "end.y" (line ends' coordinates) and "class" (type of the straight line arc). These two arcs ensure the notation of the spline arc comply with its class. "spline" will be rendered as splines connecting "source.arc" and "target.arc". Each "spline" is defined by coordinates of four points: s (starting point of spline), c1 (the first control point of spline), c2 (the second control point of spline) and e (ending point of spline). In case of complicated routing, there could be multiple "splines" in an "arc.spline".

The function first checks if the SBGN-ML file has spline arcs (XHTML element "edge.spline.info") and use it if found. When there are no spline arcs, it will use straight line arcs (XHTML element "arc"). Please check out examples in our SBGN-ML file collection.

3. About ID mapping

SBGNview can automatically map several ID types to glyphs of pathwayCommons, MetaCyc and MetaCrop. For user defined SBGN-ML file, users need to provide information about how to map their omics data to glyphs.

3.1 How SBGNview finds glyph IDs in SBGN-ML file: Glyph IDs are recorded in attribute "id" in XHTML element "glyph". But for ID mapping, user can use other attributes by changing parameter "sbgn.id.attr".

3.2 For our SBGN-ML file collection., SBGNview can do ID mapping automatically. It uses extracted mapping between 1) UNIPROT/uniref and "macromolecule" glyph IDs and 2) ChEBI and "simple chemical" glyph IDs from biopax files in pathwayCommons and MetaCyc (other files were used for MetaCyc). For other ID types, we used pathview (gene/protein) and UniChem (compound) to map to UNIPROT and ChEBI, respectively, then map them to glyph IDs. Please see the README.txt file for more details. For MetaCrop, we used pathview for ID mapping.

4. Two common scenarios of using SBGNview

4.1 Using our pre-generated SBGN-ML files.

Supported pathways can be found using data("pathways.info"). This is a collection of SBGN-ML files for these databases: MetaCyc, MetaCrop and three databases collected by pathwayCommons (panther, Reactome and smpdb). For each file, the glyph layout is based on fdp and optimized to eliminate glyph-glyph overlaps. The arcs are splines that are routed to eliminate arc-glyph crossings.

To use these data, SBGNview needs the following parameters:

-gene.id.type and/or cpd.id.type (at least one should be provided)

SBGNview can map omics data to SBGN-ML glyphs automatically. Supported ID types can be found in data("mapped.ids")

Input SBGN-ML files can be obtained by using function "download.sbgn.file". It uses a pathway ID to download the SBGN-ML file. Available pathway IDs can be found in data("pathways.info"), in column "pathway.id". database should one of the following "MetaCrop", "MetaCyc" or "pathwayCommons".

The pathway database information for each pathway ID can be found in column "database".

4.2 Using SBGN-ML files from other sources.

In this scenario, database can be any character, but should NOT be set to any of the following "MetaCrop", "MetaCyc" or "pathwayCommons". Because these three values will overwrite other parameters such as "sbgn.gene.id.type" and "sbgn.cpd.id.type"

4.2.1 Input omics data have the SAME ID type as the glyph ID type in SBGN-ML file:

In this scenario, SBGNview needs the following information to map omics data to SBGN-ML glyphs:

-ID type of input omics data (gene.id.type and/or cpd.id.type)

-ID type of glyphs of input SBGN-ML file (sbgn.gene.id.type and/or sbgn.cpd.id.type).

These ID types can be any characters, but gene.id.type must be the same as sbgn.gene.id.type, and cpd.id.type must be the same as sbgn.cpd.id.type.

Users can use the function "change.data.id" to change the omics IDs to the glyph IDs in SBGN-ML file. This is recommended when the users want to overlay omics data on a large number of pathways. Because we only need to do ID mapping once, instead of letting SBGNview to do ID mapping for every pathway.

4.2.2 Input omics data have DIFFERENT ID type as the glyph ID type in SBGN-ML file:

In this scenario, SBGNview needs the following information to map omics data to SBGN-ML glyphs:

-ID type of input omics data (gene.id.type and/or cpd.id.type)

-ID type of glyphs of input SBGN-ML file (sbgn.gene.id.type and/or sbgn.cpd.id.type).

-A mapping table between input omics IDs and SBGN-ML glyph IDs (id.mapping.gene and/or id.mapping.cpd).

For user's convinience, pathview can generate such tables for several ID types (functions "geneannot.map" and "cpdidmap"). But column names need to be changed to the values of "input.type" and "output.type", as described in the "parameters" section.

Value

A SBGNview object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
 ### use simulated data. Please see vignettes for more examples
 data(pathways.info)
 data("sbgn.xmls")
SBGNview.obj = SBGNview(
              simulate.data = TRUE
              ,sbgn.dir = "./"
              ,input.sbgn = "P00001"
              
              ,output.file = "./test.local.file" 
              ,output.formats = c("pdf")
              
              ,min.gene.value = -1
              ,max.gene.value = 1
            )
 SBGNview.obj           
 

chemokine/OmicsSBGN documentation built on June 27, 2019, 7:52 p.m.