The goal of {richcleaner} is to clean up enriched data generated by
gsea. Just a couple of
handy functions that we can share in Marchionni lab.
library(devtools)
install_github("c1au6i0/richcleaner")
Organize the folders in contrasts, each containing the folder for each
of the gene set databases used in gsea. Put the whole output folder of
gsea inside the corresponding folder. Recommanded folder name for
contrasts uses _ to separate group1 from group2 (group1_group2).
#> root_folder
#> ├── contrast1
#> │ ├── BP
#> │ │ └── gsea_folder
#> │ │ ├── gsea_report_for_na_neg_1605121516086.tsv
#> │ │ ├── gsea_report_for_na_pos_1605121516086.tsv
#> │ │ └── many_other_files.ext
#> │ ├── CS
#> │ │ └── gsea_folder
#> │ │ ├── gsea_report_for_na_neg_1605121516086.tsv
#> │ │ ├── gsea_report_for_na_pos_1605121516086.tsv
#> │ │ └── many_other_files.ext
#> │ └── Reactome
#> │ └── gsea_folder
#> │ ├── gsea_report_for_na_neg_1605121516086.tsv
#> │ ├── gsea_report_for_na_pos_1605121516086.tsv
#> │ └── many_other_files.ext
#> └── contrast2
#> ├── BP
#> │ └── gsea_folder
#> │ ├── gsea_report_for_na_neg_1605121516086.tsv
#> │ ├── gsea_report_for_na_pos_1605121516086.tsv
#> │ └── many_other_files.ext
#> ├── CS
#> │ └── gsea_folder
#> │ ├── gsea_report_for_na_neg_1605121516086.tsv
#> │ ├── gsea_report_for_na_pos_1605121516086.tsv
#> │ └── many_other_files.ext
#> └── Reactome
#> └── gsea_folder
#> ├── gsea_report_for_na_neg_1605121516086.tsv
#> ├── gsea_report_for_na_pos_1605121516086.tsv
#> └── many_other_files.ext
Use the function rich_aggregate to look inside that tree folder for
reports files and to aggregate them in a single long dataframe.
Running the function with no path argument starts an interactive
{svDialogs} modal dialog box to choose the folder.
library(richcleaner)
rich_results <- rich_aggregate()
names(rich_results)
#> [1] "contrast" "gs" "description" "size" "es"
#> [6] "nes" "nom_p_val" "fdr_q_val" "fwer_p_val" "rank_at_max"
#> [11] "leading_edge"
Some packages like {pheatmaps} takes as an input a matrix, so it is
convenient to select the results of a particular gene set database and
to pivot the dataframe in a wider format with rich_wider.
rich_results_wider <- rich_wider(rich_results, fdr_threshold = 0.001, gs = "GOBPs", value = "n_logp_sign")
The argument fdr_threshold is used for filtering out gene sets that
have an false discovery rate more than a certain number. We can pivot
wider the dataframe setting value to one of the numerical columns of
the dataframe.
The function rich_pheatmap is just a wrapper than runs rich_wider
and pheatmap::pheatmap on the result.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.