SBGNview: Overlay omics data on SBGN pathway diagram and output image...

View source: R/SBGNview.R

SBGNviewR Documentation

Overlay omics data on SBGN pathway diagram and output image files.

Description

This is the main function to map, integrate and render omics data on pathway graphs. Two inputs are needed: 1. A pathway file in SBGN-ML format and 2. gene and/or compound omics data. The function generates image file of a pathway graph with the omics data mapped to the glyphs and rendered as pseudo-colors. If no gene and/or compound omics data is supplied to the function, the function will output the SVG image file (and other selected file formats) of the parsed input file . This is useful for viewing the pathway graph without overlaid omics data. This function is similar to Pathview except the pathways are rendered with SBGN notation. In addition, users can control more graph properties including node/edge attributes. We collected SBGN-ML files from several pathway databases: Reactome, MetaCyc, MetaCrop, PANTHER and SMPDB. Given a vector of patway IDs, SBGNview can automatically download and use these SBGN-ML files. To map omics data to glyphs, user just needs to specify omics data ID types. When using user customized SBGN-ML files, users need to provide a mapping file from omics data's molecule IDs to SBGN-ML file's glyph IDs.

Usage

SBGNview(
  gene.data = NULL,
  cpd.data = NULL,
  simulate.data = FALSE,
  input.sbgn = NULL,
  sbgn.dir = "./",
  output.file = "./output.svg",
  node.sum = "sum",
  gene.id.type = NA,
  cpd.id.type = NA,
  sbgn.id.attr = "id",
  sbgn.gene.id.type = NULL,
  sbgn.cpd.id.type = NA,
  id.mapping.gene = NULL,
  id.mapping.cpd = NULL,
  org = "hsa",
  output.formats = c("svg"),
  pathway.name = NULL,
  show.pathway.name = FALSE,
  SBGNview.data.folder = "./SBGNview.tmp.data",
  ...
)

Arguments

gene.data

A matrix, vector or SummarizedExperiment object. The same as 'gene.data' argument in package 'pathview', it is either a vector (single measurement) or a matrix-like data (multiple measurements). If the data is a vector, the entries should be numeric and names of entries should be gene IDs. Matrix data structure has genes as rows and samples as columns. Row names should be gene IDs. Here gene ID is a generic concept, including multiple types: gene, transcript or protein. Default gene.data=NULL.

cpd.data

A matrix, vector or SummarizedExperiment object. The same as 'gene.data', excpet named with compound IDs. Default cpd.data=NULL.

simulate.data

Logical. SBGNview can simulate a dataset. If set to TRUE, SBGNview will simulate a gene data set and a compound dataset and user input 'gene.data' and 'cpd.data' are ignored.

input.sbgn

A character vector. Can be either names of local SBGN files or pathway IDs of our pre-collected pathways. For pre-collected pathway IDs, run 'data(pathways.info)'

sbgn.dir

A character string. Default: ".". The path to the folder that holds SBGN-ML files. If 'input.sbgn' is a vector of pathway IDs in data 'pathways.info', the SBGN-ML files will be downloaded into this folder.

output.file

A character string. Default: "./output.svg". Path to the output image files. Because we often work with multiple pathways, each pathway will have its own image files. Each string in 'input.sbgn' will be added to the end of 'output.file'. Depending on the image format specified by the 'output.formats' parameter, extentions will be added to the end (e.g. .pdf, .png etc.).

node.sum

A character string. Default: "sum". Sometimes multiple omics genes/compounds are mapped to one SBGN glyph. Therefore multiple values will be mapped to one measurement/slice on the glyph. In this situation, we may need to derive a single value for the slice on the glyph. This function can be any R function that takes a numeric vector as input and output a single numeric value (e.g. 'sum','max','min','mean'). It can also be a User Defined Function (UDF).

gene.id.type

A character string. The type of gene ID in 'gene.data'. This parameter is used for ID mapping. It should be one of the IDs in data 'mapped.ids'. For details, run: data('mapped.ids')

cpd.id.type

A character string. The type of compound ID in 'cpd.data'. For details, run: data('mapped.ids')

sbgn.id.attr

A character string. This tells SBGNview where to find the ID of a glyph in SBGN-ML file for ID mapping. This ID is used to map omics data to the glyph. It is normally the name of an attribute in the 'glyph' element . For example : <glyph class='macromolecule' id='p53'> </glyph>. We can specify: sbgn.id.attr = 'id'; sbgn.gene.id.type = 'SYMBOL'. For our pre-generated SBGN-ML files, the ID attribute will be determined automatically thus can be omitted. Accepted values: 1. Any attribute name in element 'glyph' For example : <glyph class='macromolecule' id='p53' protein='P04637'> </glyph>. We can specify: sbgn.id.attr = 'protein'; sbgn.gene.id.type = 'UNIPROT', then 'P04637' will be the glyph ID. 2. The string 'label', this will make SBGNview use the glyph label as glyph ID. For example: <glyph id='glyph14' class='simple chemical'> <label text='L-alanine'/> </glyph>. We can specify: sbgn.id.attr = 'label'; sbgn.cpd.id.type = 'compound.name', then 'L-alanine' will be used as glyph ID.

sbgn.gene.id.type

A character string. The ID type of "macromolecule" glyphs in SBGN-ML file (See parameter 'sbgn.id.attr' for more details). This parameter is used for ID mapping, i.e. either use our pre-generated mapping tables or find corresponding columns in user defined mapping tables in 'id.mapping.gene'. For our pre-generated SBGN-ML files, this will be determined automatically according to the pathway IDs thus can be omitted. For user defined SBGN-ML file, this parameter should be one of the column names of the matrix 'id.mapping.gene'.

sbgn.cpd.id.type

A character string. Similar to 'sbgn.gene.id.type'. The corresponding glyphs are "simple chemicals"

id.mapping.gene

A matrix. Mapping table between gene.id.type and sbgn.gene.id.type. This table is needed if the ID pair of gene.id.type and sbgn.gene.id.type is NOT included in data 'mapped.ids' or not mappable by package 'pathview'. This matrix should have two columns for gene.id.type and sbgn.gene.id.type, respectively. Column names should be the values of parameters 'sbgn.gene.id.type' and 'gene.id.type'. See example section for an example.

id.mapping.cpd

A matrix. See id.mapping.gene.

org

A character string. Default: "hsa". The species of the gene omics data. It is used for species specific gene ID mapping. Currently only supports three letters KEGG code (e.g. hsa, mmu, ath). For a complete list of KEGG codes, see this page:
KEGG Organisms: Complete Genomes

output.formats

A character vector. It specifies the formats of output image files. The vector should be a subset of c('pdf' , 'ps', 'png'). By default the function will always output a svg file. SBGNview uses rsvg to convert svg file to other formats. If other 'output.formats' is set but 'rsvg' package is not installed, an error will occur. See this page for how to install 'rsvg'

pathway.name

A character string. Change/update pathway name displayed on the output graph. If 'input.sbgn' is a pathway ID in data(pathways.info), the pathway name and database associated with the pathway ID will be displayed on the output graph. If 'input.sbgn' is a SBGN-ML file not part of our pre-generated SBGN-ML files, nothing will be dispalyed for the pathway name unless set using this arugmnet.

show.pathway.name

Logical. Default: F. If set to TRUE and 'input.sbgn' are pre-collected pathway IDs, the pathway name will be added to the output file name.

SBGNview.data.folder

A character string. Default: "./SBGNview.tmp.data". The path to a folder that will hold temp data files.

...

Other parameters passed to function renderSbgn

Details

1. About SBGNview()

This function extracts glyph (node) and arc (edge) data from a SBGN-ML file and creates a SBGN graph from the extracted data (draws shapes etc. in SVG format). Then it maps omics data to the glyphs and renders data as colors. Currently it maps gene/protein omics data to 'macromolecule' glyphs and maps compound omics data to 'simple chemical' glyphs.

2. About SBGN-ML files and curved arcs encoding

SBGNview can parse SBGN-ML files with standard SBGN PD syntax. For arcs, SBGNview can handle both straight lines (standard SBGN PD syntax) and spline curves (a syntax add by us). Current SBGN-ML syntax supports straight lines. The coordinates of line start or end points are stored in element 'arc'. But straight lines often produce node-edge or edge-edge crossings. Therefore, we generated SBGN-ML files with pre-routed spline edges.

To store the routed splines, we added an XHTML element called 'edge.spline.info', which has children elements called 'arc.spline' . Each 'arc.spline' element has three types of children elements: 'source.arc', 'target.arc' and 'spline'. 'source.arc' and 'target.arc' will be rendered as straight line arcs controlled by attributes 'start.x','start.y', 'end.x', 'end.y' (line ends' coordinates) and 'class' (type of the straight line arc). These two arcs ensure the notation of the spline arc comply with its class. 'spline' will be rendered as splines connecting 'source.arc' and 'target.arc'. Each 'spline' is defined by coordinates of four points: s (starting point of spline), c1 (the first control point of spline), c2 (the second control point of spline) and e (ending point of spline). In case of complicated routing, there could be multiple 'splines' in an 'arc.spline'.

The function first checks if the SBGN-ML file has spline arcs (XHTML element 'edge.spline.info') and use it if found. When there are no spline arcs, it will use straight line arcs (XHTML element 'arc'). Please check out examples in our SBGN-ML file collection.

3. About ID mapping

SBGNview can automatically map several ID types to glyphs of pathwayCommons, MetaCyc and MetaCrop. For user defined SBGN-ML file, users need to provide information about how to map their omics data to glyphs.

3.1 How SBGNview finds glyph IDs in SBGN-ML file: Glyph IDs are recorded in attribute 'id' in XHTML element 'glyph'. But for ID mapping, user can use other attributes by changing parameter 'sbgn.id.attr'.

3.2 For our SBGN-ML file collection., SBGNview can do ID mapping automatically. It uses extracted mapping between 1) UNIPROT/uniref and 'macromolecule' glyph IDs and 2) ChEBI and 'simple chemical' glyph IDs from biopax files in pathwayCommons and MetaCyc. For other ID types, we used pathview (gene/protein) and UniChem (compound) to map to UNIPROT and ChEBI, respectively, then map them to glyph IDs. For MetaCrop, we used pathview for ID mapping.

4. Two common scenarios of using SBGNview

4.1 Using our pre-generated SBGN-ML files.

Supported pathways can be found using data('pathways.info'). This is a collection of SBGN-ML files for these databases: MetaCyc, MetaCrop and three databases collected by pathwayCommons (PANTHER Pathway, Reactome and SMPDB). For SBGN-ML each file, the glyph layout is based on fdp and optimized to eliminate glyph-glyph overlaps. The arcs are splines that are routed to eliminate arc-glyph crossings.

To use these data, SBGNview needs the following parameters:

-gene.id.type and/or cpd.id.type (at least one should be provided)

SBGNview can map omics data to SBGN-ML glyphs automatically. Supported ID types can be found in data('mapped.ids')

Input SBGN-ML files can be obtained by using function 'downloadSbgnFile'. '

4.2 Using SBGN-ML files from other sources.

4.2.1 Input omics data have the SAME ID type as the glyph ID type in SBGN-ML file:

In this scenario, SBGNview needs the following information to map omics data to SBGN-ML glyphs:

-ID type of input omics data (gene.id.type and/or cpd.id.type)

-ID type of glyphs of input SBGN-ML file (sbgn.gene.id.type and/or sbgn.cpd.id.type).

These ID types can be any characters, but gene.id.type must be the same as sbgn.gene.id.type, and cpd.id.type must be the same as sbgn.cpd.id.type.

Users can use the function 'changeDataId' to change the omics IDs to the glyph IDs in SBGN-ML file.

4.2.2 Input omics data have DIFFERENT ID type as the glyph ID type in SBGN-ML file:

In this scenario, SBGNview needs the following information to map omics data to SBGN-ML glyphs:

-ID type of input omics data (gene.id.type and/or cpd.id.type)

-ID type of glyphs of input SBGN-ML file (sbgn.gene.id.type and/or sbgn.cpd.id.type).

-A mapping table between input omics IDs and SBGN-ML glyph IDs (id.mapping.gene and/or id.mapping.cpd).

For user's convinience, pathview can generate such tables for several ID types (functions 'geneannot.map' and 'cpdidmap'). But column names need to be changed to the values of 'gene.id.type' and 'sbgn.gene.id.type'.

Value

SBGNview object (S3 class object) which is a list containing three elements:

1. data:

2. output.file: A string of the path to the output file. It is the string set by parameter 'output.file' in function SBGNview.

3. output.formats: A character vector specifying the formats of output image files. The vector should be a subset of c('pdf' , 'ps', 'png'). By default the function will always output a svg file.

This S3 class of objects contains information generated by the SBGNview function. The SBGNview object can be modified by using the "+" binary operator in conjunction with the highlightArcs, highlightNodes, and highlightPath functions. This mechanism to allow layer-by-layer graph modification is similar to that of ggplot2, a widely used R package for data visualization. Defining this class as a formal (S4) class will produce an S4 SBGNview object which doesn't work with the binary operator and doesn't allow for layer-by-layer graph modification. Therefore, we decided to go with the S3 implementation of the SBGNview class since it works well with the binary operator making this functionality more intuitive and user friendly.

Examples

### Use simulated data. Please see vignettes for more examples.
### Run `browseVignettes(package = "SBGNview")` 

# load demo dataset, SBGN pathway data collection and info, which may take a few seconds
# we use a cancer microarray dataset 'gse16837.d' from package 'pathview'
data("gse16873.d", "pathways.info", "sbgn.xmls")

# search for pathways with user defined keywords
input.pathways <- findPathways("Adrenaline and noradrenaline biosynthesis")

# render SBGN pathway graph and output image files
SBGNview.obj <- SBGNview(gene.data = gse16873.d[,1:3],
                         gene.id.type = "entrez",
                         input.sbgn = input.pathways$pathway.id,
                         output.file = "quick.start",
                         output.formats = c("png", "pdf"),
                         min.gene.value = -1,
                         max.gene.value = 1) 
 SBGNview.obj           


datapplab/SBGNview documentation built on June 20, 2022, 9:55 p.m.