README.md
In c1au6i0/richcleaner: Clean results generated by GSEA

richcleaner

The goal of {richcleaner} is to clean up enriched data generated by gsea. Just a couple of handy functions that we can share in Marchionni lab.

library(devtools)

install_github("c1au6i0/richcleaner")

Organize the folders in contrasts, each containing the folder for each of the gene set databases used in gsea. Put the whole output folder of gsea inside the corresponding folder. Recommanded folder name for contrasts uses _ to separate group1 from group2 (group1_group2).

#> root_folder
#> ├── contrast1
#> │   ├── BP
#> │   │   └── gsea_folder
#> │   │       ├── gsea_report_for_na_neg_1605121516086.tsv
#> │   │       ├── gsea_report_for_na_pos_1605121516086.tsv
#> │   │       └── many_other_files.ext
#> │   ├── CS
#> │   │   └── gsea_folder
#> │   │       ├── gsea_report_for_na_neg_1605121516086.tsv
#> │   │       ├── gsea_report_for_na_pos_1605121516086.tsv
#> │   │       └── many_other_files.ext
#> │   └── Reactome
#> │       └── gsea_folder
#> │           ├── gsea_report_for_na_neg_1605121516086.tsv
#> │           ├── gsea_report_for_na_pos_1605121516086.tsv
#> │           └── many_other_files.ext
#> └── contrast2
#>     ├── BP
#>     │   └── gsea_folder
#>     │       ├── gsea_report_for_na_neg_1605121516086.tsv
#>     │       ├── gsea_report_for_na_pos_1605121516086.tsv
#>     │       └── many_other_files.ext
#>     ├── CS
#>     │   └── gsea_folder
#>     │       ├── gsea_report_for_na_neg_1605121516086.tsv
#>     │       ├── gsea_report_for_na_pos_1605121516086.tsv
#>     │       └── many_other_files.ext
#>     └── Reactome
#>         └── gsea_folder
#>             ├── gsea_report_for_na_neg_1605121516086.tsv
#>             ├── gsea_report_for_na_pos_1605121516086.tsv
#>             └── many_other_files.ext

Use the function rich_aggregate to look inside that tree folder for reports files and to aggregate them in a single long dataframe. Running the function with no path argument starts an interactive {svDialogs} modal dialog box to choose the folder.

library(richcleaner)
rich_results <- rich_aggregate()
names(rich_results)

#>  [1] "contrast"     "gs"           "description"  "size"         "es"          
#>  [6] "nes"          "nom_p_val"    "fdr_q_val"    "fwer_p_val"   "rank_at_max" 
#> [11] "leading_edge"

Some packages like {pheatmaps} takes as an input a matrix, so it is convenient to select the results of a particular gene set database and to pivot the dataframe in a wider format with rich_wider.

rich_results_wider <- rich_wider(rich_results, fdr_threshold = 0.001, gs = "GOBPs", value = "n_logp_sign")

The argument fdr_threshold is used for filtering out gene sets that have an false discovery rate more than a certain number. We can pivot wider the dataframe setting value to one of the numerical columns of the dataframe.

The function rich_pheatmap is just a wrapper than runs rich_wider and pheatmap::pheatmap on the result.

c1au6i0/richcleaner documentation built on Dec. 31, 2020, 9:01 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com