runGSAhyper: Gene set analysis with Fisher's exact test

View source: R/runGSAhyper.R

runGSAhyperR Documentation

Gene set analysis with Fisher's exact test

Description

Performs gene set analysis (GSA) based on a list of significant genes and a gene set collection, using Fisher's exact test, returning the gene set p-values.

Usage

runGSAhyper(
  genes,
  pvalues,
  pcutoff,
  universe,
  gsc,
  gsSizeLim = c(1, Inf),
  adjMethod = "fdr"
)

Arguments

genes

a vector of all genes in your experiment, or a small list of significant genes.

pvalues

a vector (or object to be coerced into one) of pvalues for genes or a binary vector with 0 for significant genes. Defaults to rep(0,length(genes)), i.e. genes is a vector of genes of interest.

pcutoff

p-value cutoff for significant genes. Defaults to 0 if pvalues are binary. If p-values are spread in [0,1] defaults to 0.05.

universe

a vector of genes that represent the universe. Defaults to genes if pvalues are not all 0. If pvalues are all 0, defaults to all unique genes in gsc.

gsc

a gene set collection given as an object of class GSC as returned by the loadGSC function.

gsSizeLim

a vector of length two, giving the minimum and maximum gene set size (number of member genes) to be kept for the analysis. Defaults to c(1,Inf).

adjMethod

the method for adjusting for multiple testing. Can be any of the methods supported by p.adjust, i.e. "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr" or "none".

Details

The statistical test performed is a one-tailed Fisher's exact test on the contingency table with columns "In gene set" and "Not in gene set" and rows "Significant" and "Non-significant" (this is equivalent to a hypergeometric test).

Command run for gene set i:

fisher.test(res$contingencyTable[[i]], alternative="greater"),

the res$contingencyTable object is available from the object returned from runGSAhyper.

The main difference between runGSA and runGSAhyper is that runGSA uses the gene-level statistics (numerical values for each gene) to calculate the gene set p-values, whereas runGSAhyper only uses the group membership of each gene (in/not in gene set, significant/non-significant). This means that for runGSAhyper a p-value cut-off for determining significant genes has to be chosen by the user and after this, all significant genes will be seen as equally significant (i.e. the actual p-values are not used). The advantage with runGSAhyper is that you can use it to find enriched gene sets when you only have a list of interesting genes, without any statistics.

Value

A list-like object containing the following elements:

pvalues

a vector of gene set p-values

p.adj

a vector of gene set p-values, adjusted for multiple testing

resTab

a full result table

contingencyTable

a list of the contingency tables used for each gene set

gsc

the input gene set collection

Author(s)

Leif Varemo piano.rpkg@gmail.com and Intawat Nookaew piano.rpkg@gmail.com

See Also

piano, loadGSC, runGSA, fisher.test, phyper, networkPlot

Examples


   # Load example input data (dummy p-values and gene set collection):
   data("gsa_input")
   
   # Load gene set collection:
   gsc <- loadGSC(gsa_input$gsc)
   
   # Randomly select 100 genes of interest (as an example):
   genes <- sample(unique(gsa_input$gsc[,1]),100)
      
   # Run gene set analysis using Fisher's exact test:
   res <- runGSAhyper(genes, gsc=gsc)
   
   # If you have p-values for the genes and want to make a cutoff for significance:
   genes <- names(gsa_input$pvals) # All gene names
   p <- gsa_input$pvals # p-values for all genes
   res <- runGSAhyper(genes, p, pcutoff=0.001, gsc=gsc)
   
   # If the 20 first genes are the interesting/significant ones they can be selected
   # with a binary vector:
   significant <- c(rep(0,20),rep(1,length(genes)-20))
   res <- runGSAhyper(genes, significant, gsc=gsc)
   
   


varemo/piano documentation built on Sept. 19, 2022, 12:01 p.m.