oncoEnrichR is an R package for functional interrogation of human genesets in the context of cancer.
The package is intended for exploratory analysis and prioritization of a gene list (referred to as query set below) from high-throughput cancer biology experiments, e.g. genetic screens (siRNA/CRISPR), protein proximity labeling, or transcriptomics (differential expression). The tool queries a number of high-quality data resources in order to assemble useful gene annotations and analyses in an interactive report. The contents of the final report attempts to shed light on the following questions:
Data harvested from the following resources form the backbone of oncoEnrichR:
install.packages('devtools')
devtools::install_github('sigven/oncoEnrichR')
library(oncoEnrichR)
oncoEnrichR performs its operations through the following procedures/methods:
1. oncoEnrichR::onco_enrich()
Consists of two main processing steps:
1) Takes an input/query list of human gene/protein identifiers (e.g. UniProt accession, RefSeq/Ensembl transcript identifer etc.) as input and conducts uniform identifier conversion
2) Performs extensive annotation, enrichment and membership analyses of the query set against underlying data sources on cancer-relevant properties of human genes and their interrelationships.
Technically, the method returns a list object with all contents of the analyses performed. The specific arguments/options and default values are outlined below:
r
onco_enrich(
query,
query_id_type = "symbol",
ignore_id_err = TRUE,
project_title = "Project title",
project_owner = "Project owner",
project_description = "Project description",
bgset = NULL,
bgset_id_type = "symbol",
bgset_description = "All protein-coding genes",
p_value_cutoff_enrichment = 0.05,
p_value_adjustment_method = "BH",
q_value_cutoff_enrichment = 0.2,
min_geneset_size = 10,
max_geneset_size = 500,
min_subcellcomp_confidence = 1,
simplify_go = TRUE,
ppi_add_nodes = 50,
ppi_score_threshold = 900,
show_ppi = TRUE,
show_drugs_in_ppi = TRUE,
show_disease = TRUE,
show_top_diseases_only = TRUE,
show_drug = TRUE,
show_enrichment = TRUE,
show_tcga_aberration = TRUE,
show_tcga_coexpression = TRUE,
show_subcell_comp = TRUE,
show_crispr_lof = TRUE,
show_cell_tissue = TRUE,
show_prognostic_cancer_assoc = TRUE,
show_complex = TRUE)
Argument |Description
------------- |----------------
query
| character vector with gene/query identifiers
query_id_type
| character indicating type of identifier used for query set (one of "uniprot_acc", "symbol", "entrezgene", "ensembl_gene", "refseq_mrna", "refseq_protein", "ensembl_protein", or "ensembl_mrna")
ignore_id_err
| logical indicating if analysis should continue when erroneous/unmatched query identifiers are encountered in query or background gene set
project_title
| project title (report title)
project_owner
| project owner (e.g. lab/PI)
project_description
| brief description of project, how target list was derived
bgset
| character vector with gene identifiers, used as reference/background for enrichment/over-representation analysis
bgset_id_type
| character indicating type of identifier used for background set (one of "uniprot_acc", "symbol", "entrezgene", "ensembl_gene", "refseq_mrna", "refseq_protein", "ensembl_protein", or "ensembl_mrna")
bgset_description
| character with description of background gene set (e.g. 'All lipid-binding proteins (n = 200)')
p_value_cutoff_enrichment
| cutoff p-value for enrichment/over-representation analysis
p_value_adjustment_method
| one of "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"
q_value_cutoff_enrichment
| cutoff q-value for enrichment/over-representation analysis
min_geneset_size
| minimal size of geneset annotated by term for testing in enrichment/over-representation analysis
max_geneset_size
| maximal size of geneset annotated by term for testing in enrichment/over-representation analysis
min_subcellcomp_confidence
| minimum confidence level of subcellular compartment annotations (range from 1 to 6, 6 is strongest)
simplify_go
| remove highly similar GO terms in results from GO enrichment/over-representation analysis (recommended)
ppi_add_nodes
| number of neighbouring nodes to add to query set when computing the protein-protein interaction network (STRING)
ppi_score_threshold
| minimum significance score (0-1000) for protein-protein interactions to be included in network (STRING)
show_ppi
| logical indicating if report should contain protein-protein interaction data of query set and their closely interacting partners (STRING), local network communities, and rank of query set based on network centrality
show_drugs_in_ppi
| logical indicating if targeted drugs (>= phase 3) should be displayed in protein-protein interaction network (Open Targets Platform)
show_disease
| logical indicating if report should contain ranked associations to cancer phenotypes (overall), as well as tumor-type specific rankings (association score >= 0.4, minimum number of sources contributing to association >= 2, (Open Targets Platform))
show_top_diseases_only
| logical indicating if report should only show top (20) cancer phenotypes/disease associations from Open Targets Platform
show_drug
| logical indicating if report should contain cancer drugs targeted towards proteins in the query list (early and late development phase) and tractability/druggability data for all query entries, from Open Targets Platform)
show_enrichment
| logical indicating if report should perform and list functional enrichment/over-representation analysis of query set (MSigDB, GO, KEGG, REACTOME, WikiPathways, NetPath)
show_tcga_aberration
| logical indicating if report should contain TCGA aberration plots (amplifications/deletions, SNVs/InDels (oncoplots))
show_tcga_coexpression
| logical indicating if report should list oncogenes/tumor suppressor genes that significantly correlate with entries in query set in terms of expression (across TCGA cohorts)
show_subcell_comp
| logical indicating if report should list subcellular compartment annotations (ComPPI)
show_crispr_lof
| logical indicating if report should list results from CRISPR/Cas9 loss-of-fitness screens and associated target priority scores (Project Score)
show_cell_tissue
| logical indicating if report should list results from tissue (GTex)- and cell-type (HPA) specific gene expression patterns in query set
show_prognostic_cancer_assoc
| logical indicating if report should list results from significant associations between gene expression and survival (Human Protein Atlas - Pathology Atlas)
show_complex
| logical indicating if report should show membership of target proteins in known protein complexes (CORUM)
2. oncoEnrichR::write()
Consists of two main processing steps:
1) Transform the contents of the analyses returned by oncoEnrichR::onco_enrich() into various visualizations and interactive tables
2) Assemble and write the final analysis report through
A target list of n = 134 high-confidence interacting proteins with the c-MYC oncoprotein were previously identified through BioID protein proximity assay in standard cell culture and in tumor xenografts (Dingar et al., J Proteomics, 2015). We ran this target list through the oncoEnrichR analysis workflow using the following configurations for the onco_enrich
method:
project_title = "cMYC_BioID_screen"
project_owner = "Raught et al."
and produced the following HTML report with results.
Below are R commands provided to reproduce the example output. NOTE: Replace "LOCAL_FOLDER" with a directory on your local computer:
library(oncoEnrichR)
myc_interact_targets <- read.csv(system.file("extdata","myc_data.csv", package = "oncoEnrichR"), stringsAsFactors = F)
myc_report <- oncoEnrichR::onco_enrich(query = myc_interact_targets$symbol, project_title = "cMYC_BioID_screen", project_owner = "Raught et al.")
oncoEnrichR::write(report = myc_report, file = "LOCAL_FOLDER/myc_report_oncoenrichr.html", format = "html")
oncoEnrichR::write(report = myc_report, file = "LOCAL_FOLDER/myc_report_oncoenrichr.xlsx", format = "excel")
sigven AT ifi.uio.no
OncoEnrichR is supported by the Centre for Cancer Cell Reprogramming at the University of Oslo/Oslo University Hospital, and Elixir Norway (Oslo node).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.