score_candidate_genes_from_PPI: The full WPPI workflow
In AnaGalhoz37/wppi: Weighting protein-protein interactions

score_candidate_genes_from_PPI

R Documentation

The full WPPI workflow

Description

The wppi package implements a prioritization of genes according to their potential relevance in a disease or other experimental or physiological condition. For this it uses a PPI network and functional annotations. A protein-protein interactions (PPI) in the neighborhood of the genes of interest are weighted according to the number of common neighbors of interacting partners and the similarity of their functional annotations. The PPI networks are obtained using the OmniPath (https://omnipathdb.org/) resource and functionality is deduced using the Gene Ontology (GO, http://geneontology.org/) and Human Phenotype Ontology (HPO, https://hpo.jax.org/app/) ontology databases. To score the candidate genes, a Random Walk with Restart algorithm is applied on the weighted network.

Usage

score_candidate_genes_from_PPI(
  genes_interest,
  HPO_interest = NULL,
  percentage_output_genes = 100,
  graph_order = 1,
  GO_annot = TRUE,
  GO_slim = NULL,
  GO_aspects = c("C", "F", "P"),
  GO_organism = "human",
  HPO_annot = TRUE,
  restart_prob_rw = 0.4,
  threshold_rw = 1e-05,
  databases = NULL,
  ...
)

Arguments

`genes_interest`	Character vector of gene symbols with genes known to be related to the investigated disease or condition.
`HPO_interest`	Character vector with Human Phenotype Ontology (HPO) annotations of interest from which to construct the functionality (for a list of available annotations see the 'Name' column in the data frame provided by `wppi_hpo_data`). If not specified, all the annotations available in the HPO database will be used.
`percentage_output_genes`	Positive integer (range between 0 and 100) specifying the percentage (%) of the total candidate genes in the network returned in the output. If not specified, the score of all the candidate genes is delivered.
`graph_order`	Integer larger than zero: the neighborhood range counted as steps from the genes of interest. These genes, also called candidate genes, together with the given genes of interest define the Protein-Protein Interaction (PPI) network used in the analysis. If not specified, the first order neighbors are used.
`GO_annot`	Logical: use the Gene Ontology (GO) annotation database to weight the PPI network. The default is to use it.
`GO_slim`	Character: use a GO subset (slim). If `NULL`, the full GO is used. The most often used slim is called "generic". For a list of available slims see `OmnipathR::go_annot_slim`.
`GO_aspects`	Character vector with the single letter codes of the gene ontology aspects to use. By default all three aspects are used. The aspects are "C": cellular component, "F": molecular function and "P" biological process.
`GO_organism`	Character: name of the organism for GO annotations.
`HPO_annot`	Logical: use the Human Phenotype Ontology (HPO) annotation database to weight the PPI network. The default is to use it.
`restart_prob_rw`	Numeric: between 0 and 1, defines the restart probability parameter used in the Random Walk with Restart algorithm. The default value is 0.4.
`threshold_rw`	Numeric: the threshold parameter in the Random Walk with Restart algorithm. When the error between probabilities is smaller than the threshold, the algorithm stops. The default is 1e-5.
`databases`	Database knowledge as produced by `wppi_data`.
`...`	Passed to `OmnipathR::import_post_translational_interactions`. With these options you can customize the network retrieved from OmniPath.

Details

If you use a GO subset (slim), building it at the first time might take around 20 minutes. The result is saved into the cache so next time loading the data from there is really quick. Gene Ontology annotations are available for a few other organisms apart from human. The currently supported organisms are "chicken", "cow", "dog", "human", "pig" and "uniprot_all". If you disable HPO_annot you can use wppi to score PPI networks other than human.

Value

Data frame with the ranked candidate genes based on the functional score inferred from given ontology terms, PPI and Random Walk with Restart parameters.

Examples

# example gene set
genes_interest <-
    c("ERCC8", "AKT3", "NOL3", "GFI1B", "CDC25A", "TPX2", "SHE")
# example HPO annotations set
hpo <- wppi_hpo_data()
HPO_interest <- unique(
    dplyr::filter(hpo, grepl("Diabetes", .data$Name))$Name
)
# Score 1st-order candidate genes
new_genes_diabetes <-
    score_candidate_genes_from_PPI(
        genes_interest = genes_interest,
        HPO_interest = HPO_interest,
        percentage_output_genes = 10,
        graph_order = 1)
new_genes_diabetes
# # A tibble: 30 x 3
#    score gene_symbol uniprot
#    <dbl> <chr>       <chr>
#  1 0.247 KNL1        Q8NG31
#  2 0.247 HTRA2       O43464
#  3 0.247 KAT6A       Q92794
#  4 0.247 BABAM1      Q9NWV8
#  5 0.247 SKI         P12755
# # . with 25 more rows

AnaGalhoz37/wppi documentation built on Nov. 8, 2022, 7:47 a.m.