prioritise_targets: Prioritise target genes

View source: R/prioritise_targets.R

prioritise_targetsR Documentation

Prioritise target genes

Description

Prioritise target genes based on a procedure:

  1. Disease-level: keep_deaths: Keep only diseases with a certain age of death.

  2. Disease-level: severity_threshold_max:

    Keep only diseases annotated as a certain degree of severity or greater
     (filters on maximum severity per disease).
  3. Phenotype-level: prune_ancestors:

    Remove redundant ancestral phenotypes when at least one of their
     descendants already exist.
  4. Phenotype-level: keep_descendants:

    Remove phenotypes belonging to a certain branch of the HPO,
     as defined by an ancestor term.
  5. Phenotype-level: keep_ont_levels: Keep only phenotypes at certain absolute ontology levels within the HPO.

  6. Phenotype-level: pheno_ndiseases_threshold: The maximum number of diseases each phenotype can be associated with.

  7. Phenotype-level: keep_tiers: Keep only phenotypes with high severity Tiers.

  8. Phenotype-level: severity_threshold: Keep only phenotypes with mean Severity equal to or below the threshold.

  9. Phenotype-level: gpt_filters:

    Keep only phenotypes with certain GPT annotations in specific
     severity metrics.
  10. Phenotype-level: severity_score_gpt_threshold: Keep only phenotypes with a minimum GPT severity score.

  11. Phenotype-level: info_content_threshold:

    Keep only phenotypes with a minimum information criterion score
     (computed from the HPO).
  12. Symptom-level: pheno_frequency_threshold:

    Keep only phenotypes with mean frequency equal to or above the threshold
     (i.e. how frequently a phenotype is associated with any diseases in
     which it occurs).
  13. Symptom-level: keep_onsets: Keep only symptoms with a certain age of onset.

  14. Symptom-level: symptom_p_threshold: Uncorrected p-value threshold to filter cell type-symptom associations by.

  15. Symptom-level: symptom_intersection_threshold:

    Minimum proportion of genes overlapping between a symptom gene list
     (phenotype-associated genes in the context of a particular disease)
     and the phenotype-cell type association driver genes.
  16. Cell type-level: q_threshold:

    Keep only cell type-phenotype association results at q<=0.05.
  17. Cell type-level: effect_threshold: Keep only cell type-phenotype association results at effect size>=1.

  18. Cell type-level: keep_celltypes: Keep only terminally differentiated cell types.

  19. Gene-level: keep_chr: Remove genes on non-standard chromosomes.

  20. Gene-level: evidence_score_threshold:

    Remove genes that are below an aggregate phenotype-gene
     evidence score threshold.
  21. Gene-level: gene_size: Keep only genes <4.3kb in length.

  22. Gene-level: add_driver_genes:

    Keep only genes that are driving the association with a given phenotype
     (inferred by the intersection of phenotype-associated genes and gene with
     high-specificity quantiles in the target cell type).
  23. Gene-level: keep_biotypes: Keep only genes belonging to certain biotypes.

  24. Gene-level: gene_frequency_threshold:

    Keep only genes at or above a certain mean frequency threshold
     (i.e. how frequently a gene is associated with a given phenotype
     when observed within a disease).
  25. Gene-level: keep_specificity_quantiles:

    Keep only genes in top specificity quantiles
     from the cell type dataset (CTD).
  26. Gene-level: keep_mean_exp_quantiles:

    Keep only genes in top mean expression quantiles
     from the cell type dataset (CTD).
  27. Gene-level: symptom_gene_overlap:

    Ensure that genes nominated at the phenotype-level also
     appear in the genes overlapping at the cell type-specific symptom-level.
  28. All levels: sort_cols:

    Sort candidate targets by one or more columns
     (e.g. "severity_score_gpt", "q").
    
  29. All levels: top_n:

    Only return the top N targets per variable group
     (specified with the "group_vars" argument).
     For example, setting "group_vars" to "hpo_id" and "top_n" to 1 would
     only return one target (row) per phenotype ID after sorting.

Usage

prioritise_targets(
  results = load_example_results(),
  ctd_list = load_example_ctd(c("ctd_DescartesHuman.rds", "ctd_HumanCellLandscape.rds"),
    multi_dataset = TRUE),
  phenotype_to_genes = HPOExplorer::load_phenotype_to_genes(),
  hpo = HPOExplorer::get_hpo(),
  keep_deaths = HPOExplorer::list_deaths(exclude = c("Miscarriage", "Stillbirth",
    "Prenatal death"), include_na = TRUE),
  keep_descendants = c("Phenotypic abnormality"),
  keep_ont_levels = NULL,
  pheno_ndiseases_threshold = NULL,
  gpt_filters = NULL,
  severity_score_gpt_threshold = 20,
  keep_tiers = NULL,
  severity_threshold_max = NULL,
  info_content_threshold = 8,
  run_prune_ancestors = TRUE,
  severity_threshold = NULL,
  pheno_frequency_threshold = NULL,
  keep_onsets = HPOExplorer::list_onsets(include_na = TRUE),
  effect_var = "logFC",
  q_threshold = 0.05,
  effect_threshold = 1,
  symptom_intersection_threshold = 0.25,
  keep_celltypes = NULL,
  evidence_score_threshold = 15,
  keep_chr = c(seq(22), "X", "Y"),
  gene_size = list(min = 0, max = Inf),
  gene_frequency_threshold = NULL,
  keep_biotypes = NULL,
  keep_specificity_quantiles = seq(30, 40),
  keep_mean_exp_quantiles = seq(30, 40),
  sort_cols = c(severity_score_gpt = -1, q = 1, logFC = -1, specificity = -1, mean_exp =
    -1, pheno_freq_mean = -1, gene_freq_mean = -1, width = 1),
  top_n = NULL,
  group_vars = c("hpo_id"),
  return_report = TRUE,
  verbose = TRUE
)

Arguments

results

The cell type-phenotype enrichment results generated by gen_results and merged together with merge_results

ctd_list

A named list of CellTypeDataset objects each created with generate_celltype_data.

phenotype_to_genes

Output of load_phenotype_to_genes mapping phenotypes to gene annotations.

hpo

Human Phenotype Ontology object, loaded from get_ontology.

keep_deaths

The age of death associated with each HPO ID to keep. If >1 age of death is associated with the term, only the earliest age is considered. See add_death for details.

keep_descendants

Terms whose descendants should be kept (including themselves). Set to NULL (default) to skip this filtering step.

keep_ont_levels

Only keep phenotypes at certain absolute ontology levels to keep. See add_ont_lvl for details.

pheno_ndiseases_threshold

Filter phenotypes by the maximum number of diseases they are associated with.

gpt_filters

A named list of filters to apply to the GPT annotations.

severity_score_gpt_threshold

The minimum GPT severity score that a phenotype can have across any disease.

keep_tiers

Tiers from hpo_tiers to keep. Include NA if you wish to retain phenotypes that do not have any Tier assignment.

severity_threshold_max

The max severity score that a phenotype can have across any disease.

info_content_threshold

Minimum phenotype information content threshold.

run_prune_ancestors

Prune redundant ancestral terms if any of their descendants are present. Passes to prune_ancestors.

severity_threshold

Only keep phenotypes with a mean severity score (averaged across multiple associated diseases) below the set threshold. The severity score ranges from 1-4 where 1 is the MOST severe. Include NA if you wish to retain phenotypes that do not have any severity score.

pheno_frequency_threshold

Only keep phenotypes with frequency above the set threshold. Frequency ranges from 0-100 where 100 is a phenotype that occurs 100% of the time in all associated diseases. Include NA if you wish to retain phenotypes that do not have any frequency data. See add_pheno_frequency for details.

keep_onsets

The age of onset associated with each HPO ID to keep. If >1 age of onset is associated with the term, only the earliest age is considered. See add_onset for details.

effect_var

Name of the effect size column in the results.

q_threshold

The q value threshold to subset the results by.

effect_threshold

The minimum fold change in specific expression to subset the results by.

symptom_intersection_threshold

Minimum proportion of genes overlapping between a symptom gene list (phenotype-associated genes in the context of a particular disease) and the phenotype-cell type association driver genes

keep_celltypes

Cell type to keep.

evidence_score_threshold

The minimum threshold of mean evidence scores of each gene-phenotype association to keep.

keep_chr

Chromosomes to keep.

gene_size

Min/max gene size (important for therapeutics design).

gene_frequency_threshold

Only keep genes with frequency above the set threshold. Frequency ranges from 0-100 where 100 is a gene that occurs 100% of the time in a given phenotype. Include NA if you wish to retain genes that do not have any frequency data. See add_gene_frequency for details.

keep_biotypes

Which gene biotypes to keep. (e.g. "protein_coding", "processed_transcript", "snRNA", "lincRNA", "snoRNA", "IG_C_gene")

keep_specificity_quantiles

Which cell type specificity quantiles to keep (max quantile is 40).

keep_mean_exp_quantiles

Which cell type mean expression quantiles to keep (max quantile is 40).

sort_cols

How to sort the rows using setorderv. names(sort_cols) will be supplied to the cols= argument and values will be supplied to the order= argument.

top_n

Top N genes to keep when grouping by group_vars.

group_vars

Columns to group by when selecting top_n genes.

return_report

If TRUE, will return a named list containing a report that shows the number of phenotypes/celltypes/genes remaining after each filtering step.

verbose

Print messages.

Details

Term key:

  • Disease:

    A disease defined in the database
    OMIM, DECIPHER and/or Orphanet.
  • Phenotype: A clinical feature associated with one or more diseases.

  • Symptom:

    A phenotype within the context of a particular disease.
    Within a given phenotype, there may be multiple symptoms with
     partially overlapping genetic mechanisms.
  • Assocation:

    A cell type-specific enrichment test result conducted
    at the disease-level, phenotype-level, or symptom-level.

Value

A data.table of the prioritised phenotype- and cell type-specific gene targets.

Examples

results = load_example_results()[q<0.05]
out <- prioritise_targets(results=results)

neurogenomics/MultiEWCE documentation built on Sept. 28, 2024, 2:27 a.m.