View source: R/calculate_go_enrichment.R
calculate_go_enrichment | R Documentation |
Analyses enrichment of gene ontology terms associated with proteins in the fraction of
significant proteins compared to all detected proteins. A two-sided Fisher's exact test is
performed to test significance of enrichment or depletion. GO annotations can be provided to
this function either through UniProt go_annotations_uniprot
, through a table obtained
with fetch_go
in the go_data
argument or GO annotations are fetched automatically
by the function by providing ontology_type
and organism_id
.
calculate_go_enrichment( data, protein_id, is_significant, group = NULL, y_axis_free = TRUE, go_annotations_uniprot = NULL, ontology_type, organism_id = NULL, go_data = NULL, plot = TRUE, label = TRUE, plot_cutoff = "adj_pval top10" )
data |
a data frame that contains at least the input variables. |
protein_id |
a character column in the |
is_significant |
a logical column in the |
group |
a character column in the |
y_axis_free |
a logical value that specifies if the y-axis of the plot should be "free"
for each facet if a grouping variable is provided. Default is |
go_annotations_uniprot |
recommended, a character column in the |
ontology_type |
optional, character value specifying the type of ontology that should
be used. Possible values are molecular function (MF), biological process (BP), cellular component
(CC). This argument is not required if GO annotations are provided from UniProt in
|
organism_id |
optional, character value specifying an NCBI taxonomy identifier of an
organism (TaxId). Possible inputs include only: "9606" (Human), "559292" (Yeast) and "83333"
(E. coli). Is only necessary if GO data is not provided either by |
go_data |
Optional, a data frame that can be obtained with |
plot |
a logical argument indicating whether the result should be plotted or returned as a table. |
label |
a logical argument indicating whether labels should be added to the plot. Default is TRUE. |
plot_cutoff |
a character value indicating if the plot should contain the top 10 most
significant proteins (p-value or adjusted p-value), or if a significance cutoff should be used
to determine the number of GO terms in the plot. This information should be provided with the
type first followed by the threshold separated by a space. Example are
|
A bar plot displaying negative log10 adjusted p-values for the top 10 enriched or
depleted gene ontology terms. Alternatively, plot cutoffs can be chosen individually with the
plot_cutoff
argument. Bars are colored according to the direction of the enrichment. If
plot = FALSE
, a data frame is returned. P-values are adjusted with Benjamini-Hochberg.
# Load libraries library(dplyr) library(stringr) # Create example data # Contains artificial de-enrichment for ribosomes. uniprot_go_data <- fetch_uniprot_proteome( organism_id = 83333, columns = c( "accession", "go_f" ) ) if(!is(data, "character")){ data <- uniprot_go_data %>% mutate(significant = c( rep(TRUE, 1000), rep(FALSE, n() - 1000) )) %>% mutate(significant = ifelse( str_detect( go_f, pattern = "ribosome" ), FALSE, significant )) # Plot gene ontology enrichment calculate_go_enrichment( data, protein_id = accession, go_annotations_uniprot = go_f, is_significant = significant, plot = TRUE, plot_cutoff = "pval 0.01" ) # Calculate gene ontology enrichment go_enrichment <- calculate_go_enrichment( data, protein_id = accession, go_annotations_uniprot = go_f, is_significant = significant, plot = FALSE, ) head(go_enrichment, n = 10) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.