score_candidate_genes_from_PPI: The full WPPI workflow

View source: R/WPPI_app.R

score_candidate_genes_from_PPIR Documentation

The full WPPI workflow

Description

The wppi package implements a prioritization of genes according to their potential relevance in a disease or other experimental or physiological condition. For this it uses a PPI network and functional annotations. A protein-protein interactions (PPI) in the neighborhood of the genes of interest are weighted according to the number of common neighbors of interacting partners and the similarity of their functional annotations. The PPI networks are obtained using the OmniPath (https://omnipathdb.org/) resource and functionality is deduced using the Gene Ontology (GO, http://geneontology.org/) and Human Phenotype Ontology (HPO, https://hpo.jax.org/app/) ontology databases. To score the candidate genes, a Random Walk with Restart algorithm is applied on the weighted network.

Usage

score_candidate_genes_from_PPI(
  genes_interest,
  HPO_interest = NULL,
  percentage_output_genes = 100,
  graph_order = 1,
  GO_annot = TRUE,
  GO_slim = NULL,
  GO_aspects = c("C", "F", "P"),
  GO_organism = "human",
  HPO_annot = TRUE,
  restart_prob_rw = 0.4,
  threshold_rw = 1e-05,
  databases = NULL,
  ...
)

Arguments

genes_interest

Character vector of gene symbols with genes known to be related to the investigated disease or condition.

HPO_interest

Character vector with Human Phenotype Ontology (HPO) annotations of interest from which to construct the functionality (for a list of available annotations see the 'Name' column in the data frame provided by wppi_hpo_data). If not specified, all the annotations available in the HPO database will be used.

percentage_output_genes

Positive integer (range between 0 and 100) specifying the percentage (%) of the total candidate genes in the network returned in the output. If not specified, the score of all the candidate genes is delivered.

graph_order

Integer larger than zero: the neighborhood range counted as steps from the genes of interest. These genes, also called candidate genes, together with the given genes of interest define the Protein-Protein Interaction (PPI) network used in the analysis. If not specified, the first order neighbors are used.

GO_annot

Logical: use the Gene Ontology (GO) annotation database to weight the PPI network. The default is to use it.

GO_slim

Character: use a GO subset (slim). If NULL, the full GO is used. The most often used slim is called "generic". For a list of available slims see OmnipathR::go_annot_slim.

GO_aspects

Character vector with the single letter codes of the gene ontology aspects to use. By default all three aspects are used. The aspects are "C": cellular component, "F": molecular function and "P" biological process.

GO_organism

Character: name of the organism for GO annotations.

HPO_annot

Logical: use the Human Phenotype Ontology (HPO) annotation database to weight the PPI network. The default is to use it.

restart_prob_rw

Numeric: between 0 and 1, defines the restart probability parameter used in the Random Walk with Restart algorithm. The default value is 0.4.

threshold_rw

Numeric: the threshold parameter in the Random Walk with Restart algorithm. When the error between probabilities is smaller than the threshold, the algorithm stops. The default is 1e-5.

databases

Database knowledge as produced by wppi_data.

...

Passed to OmnipathR::import_post_translational_interactions. With these options you can customize the network retrieved from OmniPath.

Details

If you use a GO subset (slim), building it at the first time might take around 20 minutes. The result is saved into the cache so next time loading the data from there is really quick. Gene Ontology annotations are available for a few other organisms apart from human. The currently supported organisms are "chicken", "cow", "dog", "human", "pig" and "uniprot_all". If you disable HPO_annot you can use wppi to score PPI networks other than human.

Value

Data frame with the ranked candidate genes based on the functional score inferred from given ontology terms, PPI and Random Walk with Restart parameters.

See Also

  • wppi_data

  • weighted_adj

  • random_walk

  • prioritization_genes

Examples

# example gene set
genes_interest <-
    c("ERCC8", "AKT3", "NOL3", "GFI1B", "CDC25A", "TPX2", "SHE")
# example HPO annotations set
hpo <- wppi_hpo_data()
HPO_interest <- unique(
    dplyr::filter(hpo, grepl("Diabetes", .data$Name))$Name
)
# Score 1st-order candidate genes
new_genes_diabetes <-
    score_candidate_genes_from_PPI(
        genes_interest = genes_interest,
        HPO_interest = HPO_interest,
        percentage_output_genes = 10,
        graph_order = 1)
new_genes_diabetes
# # A tibble: 30 x 3
#    score gene_symbol uniprot
#    <dbl> <chr>       <chr>
#  1 0.247 KNL1        Q8NG31
#  2 0.247 HTRA2       O43464
#  3 0.247 KAT6A       Q92794
#  4 0.247 BABAM1      Q9NWV8
#  5 0.247 SKI         P12755
# # . with 25 more rows


AnaGalhoz37/wppi documentation built on Nov. 8, 2022, 7:47 a.m.