View source: R/calculate_go_enrichment.R
calculate_go_enrichment | R Documentation |
Analyses enrichment of gene ontology terms associated with proteins in the fraction of
significant proteins compared to all detected proteins. A two-sided Fisher's exact test is
performed to test significance of enrichment or depletion. GO annotations can be provided to
this function either through UniProt go_annotations_uniprot
, through a table obtained
with fetch_go
in the go_data
argument or GO annotations are fetched automatically
by the function by providing ontology_type
and organism_id
.
calculate_go_enrichment(
data,
protein_id,
is_significant,
group = NULL,
y_axis_free = TRUE,
facet_n_col = 2,
go_annotations_uniprot = NULL,
ontology_type,
organism_id = NULL,
go_data = NULL,
plot = TRUE,
plot_style = "barplot",
plot_title = "Gene ontology enrichment of significant proteins",
barplot_fill_colour = c("#56B4E9", "#E76145"),
heatmap_fill_colour = protti::mako_colours,
heatmap_fill_colour_rev = TRUE,
label = TRUE,
enrichment_type = "all",
replace_long_name = TRUE,
label_move_frac = 0.2,
min_n_detected_proteins_in_process = 1,
plot_cutoff = "adj_pval top10"
)
data |
a data frame that contains at least the input variables. |
protein_id |
a character column in the |
is_significant |
a logical column in the |
group |
optional, character column in the |
y_axis_free |
a logical value that specifies if the y-axis of the plot should be "free"
for each facet if a grouping variable is provided. Default is |
facet_n_col |
a numeric value that specifies the number of columns the faceted plot should have if a column name is provided to group. The default is 2. |
go_annotations_uniprot |
recommended, a character column in the |
ontology_type |
optional, character value specifying the type of ontology that should
be used. Possible values are molecular function (MF), biological process (BP), cellular component
(CC). This argument is not required if GO annotations are provided from UniProt in
|
organism_id |
optional, character value specifying an NCBI taxonomy identifier of an
organism (TaxId). Possible inputs include only: "9606" (Human), "559292" (Yeast) and "83333"
(E. coli). Is only necessary if GO data is not provided either by |
go_data |
Optional, a data frame that can be obtained with |
plot |
a logical argument indicating whether the result should be plotted or returned as a table. |
plot_style |
a character argument that specifies the plot style. Can be either "barplot" (default)
or "heatmap". The "heatmap" plot is especially useful for the comparison of multiple groups. We recommend,
however, that you use it only with |
plot_title |
a character value that specifies the title of the plot. The default is "Gene ontology enrichment of significant proteins". |
barplot_fill_colour |
a vector that contains two colours that should be used as the fill colours for
deenriched and enriched GO terms, respectively. If |
heatmap_fill_colour |
a vector that contains colours that should be used to create the gradient in the
heatmap plot. Default is |
heatmap_fill_colour_rev |
a logical value that specifies if the provided colours in |
label |
a logical argument indicating whether labels should be added to the plot. Default is TRUE. |
enrichment_type |
a character argument that is either "all", "enriched" or "deenriched". This determines if the enrichment analysis should be performed in order to check for both enrichemnt and deenrichemnt or only one of the two. This affects the statistics performed and therefore also the displayed plot. |
replace_long_name |
a logical argument that specifies if GO term names above 50 characters should
be replaced by the GO ID instead for the plot. This ensures that the plotting area doesn't become
too small due to the long name. The default is |
label_move_frac |
a numeric argument between 0 and 1 that specifies which labels should be moved outside of the bar. The default is 0.2, which means that the labels of all bars that have a size of 20% or less of the largest bar are moved to the right of the bar. This prevents labels from overlapping with the bar boundaries. |
min_n_detected_proteins_in_process |
is a numeric argument that specifies the minimum number of
detected proteins required for a GO term to be displayed in the plot. The default is 1, meaning
no filtering of the plotted data is performed. This argument does not affect any computations or
the returned data if |
plot_cutoff |
a character value indicating if the plot should contain the top n (e.g. top10) most
significant proteins (p-value or adjusted p-value), or if a significance cutoff should be used
to determine the number of GO terms in the plot. This information should be provided with the
type first followed by the threshold separated by a space. Example are
|
A bar plot or heatmap (depending on plot_style
). By default the bar plot displays negative log10
adjusted p-values for the top 10 enriched or deenriched gene ontology terms. Alternatively, plot cutoffs
can be chosen individually with the plot_cutoff
argument. Bars are colored according to the direction
of the enrichment (enriched or deenriched). If a heatmap is returned, terms are organised on the y-axis, while
the colour of each tile represents the negative log10 adjusted p-value (default). If a group
column
is provided the x-axis contains all groups. If plot = FALSE
, a data frame is returned. P-values are adjusted with
Benjamini-Hochberg.
# Load libraries
library(dplyr)
library(stringr)
# Create example data
# Contains artificial de-enrichment for ribosomes.
uniprot_go_data <- fetch_uniprot_proteome(
organism_id = 83333,
columns = c(
"accession",
"go_f"
)
)
if (!is(uniprot_go_data, "character")) {
data <- uniprot_go_data %>%
mutate(significant = c(
rep(TRUE, 1000),
rep(FALSE, n() - 1000)
)) %>%
mutate(significant = ifelse(
str_detect(
go_f,
pattern = "ribosome"
),
FALSE,
significant
)) %>%
mutate(group = c(
rep("A", 500),
rep("B", 500),
rep("A", (n() - 1000) / 2),
rep("B", round((n() - 1000) / 2))
))
# Plot gene ontology enrichment
calculate_go_enrichment(
data,
protein_id = accession,
go_annotations_uniprot = go_f,
is_significant = significant,
plot = TRUE,
plot_cutoff = "pval 0.01"
)
# Plot gene ontology enrichment with group
calculate_go_enrichment(
data,
protein_id = accession,
go_annotations_uniprot = go_f,
is_significant = significant,
group = group,
facet_n_col = 1,
plot = TRUE,
plot_cutoff = "pval 0.01"
)
# Plot gene ontology enrichment with group in a heatmap plot
calculate_go_enrichment(
data,
protein_id = accession,
group = group,
go_annotations_uniprot = go_f,
is_significant = significant,
min_n_detected_proteins_in_process = 15,
plot = TRUE,
label = TRUE,
plot_style = "heatmap",
enrichment_type = "enriched",
plot_cutoff = "pval 0.01"
)
# Calculate gene ontology enrichment
go_enrichment <- calculate_go_enrichment(
data,
protein_id = accession,
go_annotations_uniprot = go_f,
is_significant = significant,
plot = FALSE,
)
head(go_enrichment, n = 10)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.