gsea: Gene-set enrichment analysis
In surh/HMVAR: Human Microbiome Variant Analysis in R

Description Usage Arguments Value Examples

Performs Gene-set enrichment analysis on all annotation terms for a set of genes.

1	gsea(dat, test = "wilcoxon", alternative = "greater", min_size = 3)

`dat`	A data.frame or tibble. It must contain one row per gene and columns 'gene_id', 'terms', and 'score'. Column 'terms' must be of type character and each entry must be a comma-separated character string of all the terms that annotate the corresponding gene.
`test`	Which test to perform. Either 'wilcoxon' or 'ks' for Wilcoxon Rank Sum and Kolmogorov-Smirnov tests repsectiveley. Test use R's base wilcox.test and ks.test respectiveley.
`alternative`	The alternative hypothesis to test. Either 'greater', 'less' or 'two.sided'. It corresponds to option 'alternative' in wilcox.test or ks.test. Typically, if scores are p-values one wishes to #' test the hypothesis that p-values within 'genes' are 'less' than expected; while if scores are some other type of value (like fold-change abundance) one is trying to test that those values are 'greater'. Keep in mind that the Kolmogorov-Smirnov test is a test of the maximum difference in cumulative distribution values. Therefore, an alternative 'greater' in this case correspons to cases where score is stochastially smaller than the rest.
`min_size`	The minimum number of genes in the group for the test to be performed. Basically if the number of genes that appear in 'scores' is less than 'min_size', the test won't be performed.

A tibble with elements: term (the annotation term ID), size (the number of elements in both 'genes' and 'scores'), statistic (the statistic calculated, depends on the test), and p.value (the p-value of the test). The tibble is sorted by increasing p-value.

# Make some fake data
dat <- tibble::tibble(gene_id = paste('gene', 1:10, sep = ''),
                      terms = c('term1,term2,term3',
                                NA,
                                'term2,term3,term4',
                                'term3',
                                'term4,term5',
                                'term6',
                                'term6',
                                'term6,term2',
                                'term6,term7',
                                'term6,term2'),
                      score = 1:10)
dat

# Test
gsea(dat, min_size = 2)
gsea(dat, min_size = 3, test = 'ks', alternative = 'less')