cisPath: Visualization of the shortest paths of functional interaction...

Description Usage Arguments Details Value References See Also Examples

Description

This method is used to identify and visualize the shortest functional paths between proteins in the protein–protein interaction (PPI) network.

Usage

1
2
3
4
5
cisPath(infoFile, outputDir, proteinName=NULL, targetProteins=NULL, swissProtID=FALSE, sprotFile="", tremblFile="", 
        nodeColors=c("#1F77B4", "#FF7F0E", "#D62728", "#9467BD", "#8C564B", "#E377C2"), leafColor="#2CA02C", byStep=FALSE)
## S4 method for signature 'character,character'
cisPath(infoFile, outputDir, proteinName=NULL, targetProteins=NULL, swissProtID=FALSE, sprotFile="", tremblFile="",  
        nodeColors=c("#1F77B4", "#FF7F0E", "#D62728", "#9467BD", "#8C564B", "#E377C2"), leafColor="#2CA02C", byStep=FALSE)

Arguments

infoFile

File that contains PPI data (character(1)).
Please see the file PPI_Info.txt as an example.

outputDir

Output directory (character(1)).

proteinName

Gene name or Swiss-Prot accession number of the source protein (character(1)).
If null, identify the shortest functional paths in the web page.

targetProteins

Gene names or Swiss-Prot accession numbers of the target proteins (character vector).
If null, treat all other proteins as potential targets.

swissProtID

A logical value. If targetProteins contains Swiss-Prot accession numbers, set as TRUE.
If targetProteins contains gene names, set as FALSE.

sprotFile

Input: File downloaded from the UniProt database (UniProtKB/Swiss-Prot) (character(1)).

tremblFile

Input: File downloaded from the UniProt database (UniProtKB/TrEMBL) (character(1)).

nodeColors

Represents colors for main nodes in the graph. If there are fewer values than main nodes, it will be recycled in the standard manner.
Form: "#RRGGBB", each of the pairs RR, GG, BB consist of two hexadecimal digits giving a value in the range 00 to FF.

leafColor

Represents color for leaf nodes in the graph. (character(1))
Form: "#RRGGBB", each of the pairs RR, GG, BB consist of two hexadecimal digits giving a value in the range 00 to FF.

byStep

A logical value. If users wish to identify the paths utilizing the shortest number of steps (instead of minimal cost), set byStep as TRUE.
In this situation, all the edge costs will be assigned as 1.
Note: If viewing more possible paths between two proteins is desired, we recommend this value be set as TRUE.

Details

The input PPI data file infoFile should follow the format as the output files of the method formatSTRINGPPI, formatPINAPPI, or formatiRefIndex. See files STRINGPPI.txt or PINAPPI.txt as examples. The first four fields contain the Swiss-Prot accession numbers and gene names for two interacting proteins. The PubMedID field should be stated to be NA if unavailable. The evidence field may present an introduction to the evidence. The edgeValue field should be assigned a value no less than 1. This value will be treated as the cost while identifying the shortest paths. If there is no method available to estimate this value, please give the value as 1.

The shortest functional paths between the proteins are calculated using Dijkstra's algorithm. The results are shown in an HTML file, and users can easily query them using a browser. Each shortest path is displayed as a force-directed graph (http://bl.ocks.org/4062045) with JavaScript library D3 (www.d3js.org). The HTML file follows HTML 4.01 Strict and CSS version 3 standards to maintain consistency across different browsers. Chrome, Firefox, Safari, and IE9 will all properly display the PPI view. Please contact us if the paths do not display correctly.

As an example, we have generated PPI interaction data for several species from the PINA database (http://cbg.garvan.unsw.edu.au/pina/), STRING database (http://string-db.org/), and iRefIndex database (http://www.irefindex.org/wiki/). Users can download these files from http://www.isb.pku.edu.cn/cisPath/. If you make use of these files, please cite PINA, STRING, and iRefIndex accordingly. Users can edit the PPI interactions generated with these two databases, or combine them with their private data to construct more complete PPI interaction networks. In this package, we select only a small portion of the available PPI interaction data as an example. An ID mapping file is also provided in this package, which was generated according to the data from the UniProt (http://www.uniprot.org/) database.

Using this method, a protein must be chosen as the source protein. However, target proteins need not be chosen. If target proteins are not provided, this method will identify the shortest paths between the source protein and all other relevant proteins. The user can query the results upon finding an interesting “target” protein. Although the output in such cases is large, we strongly suggest users give this method of selecting a source but not a target a try.

A protein often has several names, and some of these names have perhaps not been included in the input file infoFile. We therefore suggest users take a look at the output file targetIDs.txt to check whether the input protein names are valid. In order to avoid inputting invalid target protein names, the unique identifier Swiss-Prot accession numbers may alternatively be used as input. The Swiss-Prot accession numbers can be sought in the UniProt (http://www.uniprot.org/) database. We strongly suggest users provide the files from downloaded from the UniProt database (sprotFile and tremblFile).

All species: ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.dat.gz
Taxonomic divisions: ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/

uniprot_sprot_archaea.dat.gz and uniprot_trembl_archaea.dat.gz contain all archaea entries.
uniprot_sprot_bacteria.dat.gz and uniprot_trembl_bacteria.dat.gz contain all bacteria entries.
uniprot_sprot_fungi.dat.gz and uniprot_trembl_fungi.dat.gz contain all fungi entries.
uniprot_sprot_human.dat.gz and uniprot_trembl_human.dat.gz contain all human entries.
uniprot_sprot_invertebrates.dat.gz and uniprot_trembl_invertebrates.dat.gz contain all invertebrate entries.
uniprot_sprot_mammals.dat.gz and uniprot_trembl_mammals.dat.gz contain all mammalian entries except human and rodent entries.
uniprot_sprot_plants.dat.gz and uniprot_trembl_plants.dat.gz contain all plant entries.
uniprot_sprot_rodents.dat.gz and uniprot_trembl_rodents.dat.gz contain all rodent entries.
uniprot_sprot_vertebrates.dat.gz and uniprot_trembl_vertebrates.dat.gz contain all vertebrate entries except mammals.
uniprot_sprot_viruses.dat.gz and uniprot_trembl_viruses.dat.gz contain all eukaryotic entries except those from vertebrates, fungi and plants.
We suggest you take a look at the README file before you download these files.

If you make use of these files, please cite the UniProt database.

Value

A list will be returned, and each element will contain the shortest paths from the source protein to a target protein.

The output directory contains the shortest paths from a source protein to the target proteins. Users can search for the paths easily using a browser. The file validInputProteins.txt contains the proteins that are valid as input to the HTML file. Please take a look at the output file targetIDs.txt to check whether the input protein names to this method are valid.

References

Cowley, M.J. and et al. (2012) PINA v2.0: mining interactome modules. Nucleic Acids Res, 40, D862-865.

Wu, J. and et al. (2009) Integrated network analysis platform for protein-protein interactions. Nature methods, 6, 75-77.

Razick S. and et al. (2008) iRefIndex: A consolidated protein interaction database with provenance. BMC Bioinformatics, 9, 405

Aranda, B. and et al. (2011) PSICQUIC and PSISCORE: accessing and scoring molecular interactions, Nat Methods, 8, 528-529.

Szklarczyk,D. and et al. (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res, 39, D561-D568.

Franceschini,A. and et al. (2013) STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res, 41, D808-D815.

UniProt Consortium and others. (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res, 40, D71-D75.

See Also

formatSTRINGPPI, formatPINAPPI, formatSIFfile, formatiRefIndex, combinePPI, networkView, easyEditor.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
    # examples
    infoFile <- system.file("extdata", "PPI_Info.txt", package="cisPath")
    
    # source protein: TP53
    # Identify the shortest functional paths from TP53 to all other relevant proteins
    outputDir <- file.path(tempdir(), "TP53_example1")
    results <- cisPath(infoFile, outputDir, "TP53", byStep=TRUE)
    
    # Identify the shortest paths from TP53 to proteins MAGI1 and GH1
    outputDir <- file.path(tempdir(), "TP53_example2")
    results <- cisPath(infoFile, outputDir, "TP53", targetProteins=c("MAGI1", "GH1"), byStep=TRUE)
    results
    
    # Identify the shortest paths from TP53 to proteins Q96QZ7 and P01241 (with the Swiss-Prot accession numbers)
    outputDir <- file.path(tempdir(), "TP53_example3")
    results <- cisPath(infoFile, outputDir, "TP53", targetProteins=c("Q96QZ7", "P01241"), swissProtID=TRUE, byStep=FALSE)
    
    # Identify the shortest functional paths in the web page
    outputDir <- file.path(tempdir(), "cisPath_example")
    results <- cisPath(infoFile, outputDir)

## Not run: 
    # example of downloading PPI data from our website
    
    # Change to your own output directory
    outputDir <- file.path(getwd(), "TP53")
    # Create the output directory
    dir.create(outputDir, showWarnings=FALSE, recursive=TRUE)
    
    # infoFile: site where the PPI data file will be saved.
    infoFile <- file.path(outputDir, "PPIdata.txt")
    
    # Download PPI data from our website
    download.file("http://www.isb.pku.edu.cn/cispath/data/Homo_sapiens_PPI.txt", infoFile)
    download.file("http://www.isb.pku.edu.cn/cispath/data/Caenorhabditis_elegans_PPI.txt", infoFile)
    download.file("http://www.isb.pku.edu.cn/cispath/data/Drosophila_melanogaster_PPI.txt", infoFile)
    download.file("http://www.isb.pku.edu.cn/cispath/data/Mus_musculus_PPI.txt", infoFile)
    download.file("http://www.isb.pku.edu.cn/cispath/data/Rattus_norvegicus_PPI.txt", infoFile)
    download.file("http://www.isb.pku.edu.cn/cispath/data/Saccharomyces_cerevisiae_PPI.txt", infoFile)
    
    results <- cisPath(infoFile, outputDir, "TP53")
    outputDir <- file.path(getwd(), "cisPathWeb")
    results <- cisPath(infoFile, outputDir)
    
## End(Not run)

cisPath documentation built on Nov. 8, 2020, 7:15 p.m.