gsea: Gene-set enrichment analysis

Description Usage Arguments Value Examples

View source: R/annots.r

Description

Performs Gene-set enrichment analysis on all annotation terms for a set of genes.

Usage

1
gsea(dat, test = "wilcoxon", alternative = "greater", min_size = 3)

Arguments

dat

A data.frame or tibble. It must contain one row per gene and columns 'gene_id', 'terms', and 'score'. Column 'terms' must be of type character and each entry must be a comma-separated character string of all the terms that annotate the corresponding gene.

test

Which test to perform. Either 'wilcoxon' or 'ks' for Wilcoxon Rank Sum and Kolmogorov-Smirnov tests repsectiveley. Test use R's base wilcox.test and ks.test respectiveley.

alternative

The alternative hypothesis to test. Either 'greater', 'less' or 'two.sided'. It corresponds to option 'alternative' in wilcox.test or ks.test. Typically, if scores are p-values one wishes to #' test the hypothesis that p-values within 'genes' are 'less' than expected; while if scores are some other type of value (like fold-change abundance) one is trying to test that those values are 'greater'. Keep in mind that the Kolmogorov-Smirnov test is a test of the maximum difference in cumulative distribution values. Therefore, an alternative 'greater' in this case correspons to cases where score is stochastially smaller than the rest.

min_size

The minimum number of genes in the group for the test to be performed. Basically if the number of genes that appear in 'scores' is less than 'min_size', the test won't be performed.

Value

A tibble with elements: term (the annotation term ID), size (the number of elements in both 'genes' and 'scores'), statistic (the statistic calculated, depends on the test), and p.value (the p-value of the test). The tibble is sorted by increasing p-value.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Make some fake data
dat <- tibble::tibble(gene_id = paste('gene', 1:10, sep = ''),
                      terms = c('term1,term2,term3',
                                NA,
                                'term2,term3,term4',
                                'term3',
                                'term4,term5',
                                'term6',
                                'term6',
                                'term6,term2',
                                'term6,term7',
                                'term6,term2'),
                      score = 1:10)
dat

# Test
gsea(dat, min_size = 2)
gsea(dat, min_size = 3, test = 'ks', alternative = 'less')

surh/HMVAR documentation built on Aug. 18, 2021, 1:21 a.m.