knitr::opts_chunk$set(echo = T, error = TRUE, eval = F)

Introduction

This tutorial provides a walk through on how to use AutoGO to perform Functional Enrichment Analysis from gene lists or directly from raw counts performing Differential Expression Analysis.

The AutoGO package is structured in different, independent functions, in order to let the user decide which steps of the whole analysis to perform and which visualizations to produce, even if it is suggested to perform the whole workflow: Differential Expression Analysis; Volcano Plot; Functional Enrichment Analysis on databases chosen by the user; Visualization of the enrichment results with Barplot, Lollipop and Heatmap;

Installation

library(devtools)
install_github("mpallocc/auto-go", ref="develop")
#install.packages("autoGO")
library(autoGO)

Loading data

Loading example data to run the whole tutorial.

data(counts, groups, comparisons)

Differential Gene Expression

deseq_analysis(counts,
               groups,
               comparisons,
               padj_threshold = 0.05,
               log2FC_threshold = 0,
               pre_filtering = T,
               save_excel = F,
               outfolder = "./results",
               del_csv = ",")

Parameters:

groups <- data.frame(sample=c("sample_1", "sample_2", "sample_3", "sample_4", "sample_5", "sample_6"),
                     group=c("CTRL", "CTRL", "TREAT_A", "TREAT_A", "TREAT_B", "TREAT_B"))

comparisons <- data.frame(treatment = c("TREAT_A", "TREAT_B", "TREAT_A"),
                          control = c("CTRL", "CTRL", "TREAT_B"))
knitr::kable(list(groups, comparisons), booktabs = TRUE, valign = 't', caption = "Groups and comparisons example tables")

Filtering DESeq2 results

It allows to filter the Differential Expression Analysis results with the new filters directly from the complete output table "*_allres.tsv" without repeating the whole analysis.

filtering_DE(padj_threshold = 0.05,
             log2FC_threshold = 1,
             outfolder = "./results",
             save_excel = F)

Volcano Plot

Volcano plot of specific comparisons of the differential analysis results.

filename <- "./results/H460.2D_vs_H460.3D.2p/DE_H460.2D_vs_H460.3D.2p_allres.tsv"
volcanoplot(DE_results = filename,
            my_comparison = "H460.2D_vs_H460.3D.2p",
            log2FC_thresh = 0,
            padj_thresh = 0.05,
            highlight_genes = c("TFPI", "PROS1"),
            del_csv = ",",
            outfolder = "./results")

filename <- "./results/H460.2D_vs_H460.3D.2p/DE_H460.2D_vs_H460.3D.2p_allres.tsv"
volcanoplot(DE_results = filename,
            my_comparison = "H460.2D_vs_H460.3D.2p",
            log2FC_thresh = 1,
            padj_thresh = 0.05,
            highlight_genes = NULL,
            del_csv = ",",
            outfolder = "./results")
knitr::include_graphics("imgs/volcanoplot_highlighted.png")
knitr::include_graphics("imgs/volcano.png")

Parameters:

If the user had more conditions in the Differential Analysis, there will be many comparisons, so it is possible to apply the following structure to realize the volcano plots for all the comparisons of the analysis:

all_path_res <- list.files(path = "./results", pattern = "_allres.tsv", recursive = T, full.names = T)
res_lists <- lapply(all_path_res, function (x) read_tsv(x, col_types = cols()))
names(res_lists) <- gsub("results/|/DE_.*", "", all_path_res)

invisible(lapply(names(res_lists), function (i) volcanoplot(res_lists[[i]], my_comparison = i)))

Choose of database

It allows to search over all the Enrichr databases on which it is possible to perform the Enrichment Analysis. In this way the user is able to select only the databases for which he wants to perform downstream analysis.

choose_database(db_search = "KEGG")

Parameters:

Reading gene lists

It is employed to load in a variable all the gene lists the user would like to enrich without having to repeat several times the enrichment. It is necessary to call this function if the user is passing more than one list of genes. In the case in which also the previous steps of autoGO have been performed (from_autoGO = T) it is not necessary to pass other parameters rather than the gene_lists_path to the function, all the other information will be taken from the path of the gene lists.

gene_lists_path <- "./results"
gene_lists <- read_gene_lists(gene_lists_path = gene_lists_path,
               log2FC_threshold = 0,
               padj_threshold = 0.05,
               which_list = "down_genes",
               from_autoGO = T,
               files_format = NULL)
names(gene_lists)

Parameters:

Enrichment analysis

Each desired gene list is enriched on all the databases chosen by the user (see choose_database). The function will produce the enrichment table for each chosen database.

autoGO(list_of_genes = gene_lists,
      dbs = c("GO_Molecular_Function_2021", "GO_Biological_Process_2021", "KEGG_2021_Human"),
      my_comparison = NULL,
      ensembl = F,
      excel = F,
      outfolder = "./results")

Parameters:

It is possible to perform the Enrichment Analysis also on a single gene list. It is anyway suggested to employ the read_gene_list function.

autoGO(list_of_genes = gene_lists[[1]],
      dbs = c("GO_Molecular_Function_2021", "GO_Biological_Process_2021", "KEGG_2021_Human"),
      my_comparison = "my_comparison_2025",
      ensembl = F,
      excel = F,
      outfolder = "./results")

Reading enrichment tables

It is employed to load in a variable all the enrichment tables the user would like to subsequently plot without having to load them several times. It is necessary to call this function if the user is passing more than one enrichment table. In the case in which also the previous steps of autoGO have been performed (from_autoGO = T) it is not necessary to pass other parameters to the function rather than the enrich_table_path, all the other information will be taken from the path of the enrichment tables.

enrich_table_path <- "./results"
enrich_tables <- read_enrich_tables(
                  enrich_table_path = enrich_table_path,
                  log2FC_threshold = 0,
                  padj_threshold = 0.05,
                  which_list = "down_genes",
                  from_autoGO = T,
                  files_format = NULL)
names(enrich_tables)

Parameters:

Visualization:

Auto-GO allows the user to visualize the enrichment results with different kind of plots produced in an automated way. The barplot and the lollipop plot of the most enriched terms of each list and the heatmap of the most enriched terms along all the enriched lists.

Barplot

It produces a barplot of the 15 most enriched term for each desired enrichment table. In the case in which also the previous steps of autoGO have been performed (from_autoGO = T) it is not necessary to pass title, outfolder and outfile, all the information will be taken from the path of the gene lists.

barplotGO(enrich_tables = enrich_tables,
          title = NULL,
          outfolder = NULL,
          outfile = NULL,
          from_autoGO = TRUE)
knitr::include_graphics("imgs/barplot.png")

It is possible to produce a Barplot also on a single enrichment table. It is anyway suggested to employ the read_enrich_table function.

enrich_table <- enrich_tables[[1]]
barplotGO(enrich_tables = enrich_table,
          title = c("Title of my barplot", "and subtitle"),
          outfolder = "./results/my_comparison_2025/enrichment_plots",
          outfile = "barplot_myDB.png",
          from_autoGO = FALSE)

Parameters:

Lollipop

It produces a lolliplot of the 20 most enriched term for each desired enrichment table. In the case in which also the previous steps of autoGO have been performed (from_autoGO = T) it is not necessary to pass title and outfolder, all the information will be taken from the path of the gene lists.

lolliGO(enrich_tables = enrich_tables,
        title = NULL,
        outfolder = NULL,
        outfile = NULL,
        from_autoGO = TRUE)
knitr::include_graphics("imgs/lolligo.png")

It is possible to produce a lolliplot also on a single enrichment. It is anyway suggested to employ the read_enrich_table function.

enrich_table <- enrich_tables[[1]]
lolliGO(enrich_tables = enrich_table,
        title = c("Title of my barplot", "and subtitle"),
        outfolder = "./results/my_comparison_2023/enrichment_plots",
        outfile = "lolli_myDB.png",
        from_autoGO = FALSE)

Parameters:

HeatmapGO

If the enrichemnt analysis have been performed on many comparisons (or gene lists) it is interesting to have a look at the enrichment results over all these gene lists together.

The function automatically reads all the enrichment tables of the chosen database (if the previous steps of autoGO have been performed. In this case it is mandatory). It produces a heatmap of the most enriched terms over a certain number of comparison (or enriched lists). The user can choose the minimum of comparisons on which a certain term must be significant (i.e. if the user has 5 comparisons it is possible to look at terms significant over all the comparison using min_term_per_row = 5).

heatmapGO(db = "GO_Biological_Process_2021",
          outfolder = "./results",
          log2FC_threshold = 0,
          padj_threshold = 0.05,
          min_term_per_row = 3,
          which_list = "down_genes")

Parameters:

knitr::include_graphics("imgs/heatmap.png")

For a faster process it is possible to define a vector containing the employed database and produce the heatmap for each of them:

dbs <- c("GO_Molecular_Function_2021", "GO_Biological_Process_2021", "KEGG_2021_Human")
lapply(dbs, function (i) heatmapGO(db = i, outfolder = "./results", which_list = "down_genes", min_term_per_row = 3))

Single-sample Gene Set Enrichment Analysis

It is possible to perform a single-sample Gene Set Enrichment Analysis with the function ssgsea_wrapper.R . This kind of analysis is recommended when there are too few samples. In the end the analysis will produce an enrichment score for each sample and associated visualizations.

ssgsea_wrapper(norm_data = "results/deseq_vst_data.txt",
               gene_id_type = c("gene_symbol"),
               write_enrich_tables = TRUE,
               group = NULL,
               outfolder = "./results/ssgsea",
               full_names = TRUE,
               tpm_norm = FALSE,
               categories = c("C1", "H"))
knitr::include_graphics("imgs/distrib_C1.png")
knitr::include_graphics("imgs/heatmap_C1.png")
knitr::include_graphics("imgs/distrib_H.png")
knitr::include_graphics("imgs/heatmap_H.png")

Parameters:



mpallocc/auto-go documentation built on Feb. 25, 2025, 8:11 p.m.