README.md
In ZhenWei10/golite: Fast and simple GOEA

golite

Install the package with:

library(devtools)
install_github("ZhenWei10/golite")

The necessary input of goea should be the gene set gene ID, background gene ID, and the orgDb object.

library(golite)
library(magrittr)
library(org.Hs.eg.db)

Use randomly sampled gene IDs for gene set and background.

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
all_eids_hg19 <- names(genes(txdb))

set.seed(1)
eids_bg <- sample(all_eids_hg19, 3500)
eids_set <- sample(eids_bg,300)

Run GO enrichment analysis given gene set and background.

goea(gene_set = eids_set,
     back_ground = eids_bg,
     orgDb = org.Hs.eg.db,
     interpret_term = T) %>% head(.,10) %>% knitr::kable(.,"markdown")

## Loading required package: GO.db

term definition freq_gs freq_bg p adj_BH OR GO:0009967 positive regulation of signal transduction 29 191 0.0012078 0.5996853 1.77 GO:0010647 positive regulation of cell communication 32 225 0.0020695 0.5996853 1.66 GO:0023056 positive regulation of signaling 32 225 0.0020695 0.5996853 1.66 GO:0006672 ceramide metabolic process 3 4 0.0023288 0.5996853 8.75 GO:2001235 positive regulation of apoptotic signaling pathway 7 24 0.0030061 0.5996853 3.40 GO:0045859 regulation of protein kinase activity 17 101 0.0045006 0.5996853 1.96 GO:0071900 regulation of protein serine/threonine kinase activity 12 61 0.0045852 0.5996853 2.30 GO:0043408 regulation of MAPK cascade 15 85 0.0047575 0.5996853 2.06 GO:1902531 regulation of intracellular signal transduction 34 256 0.0047870 0.5996853 1.55 GO:0060538 skeletal muscle organ development 6 20 0.0051416 0.5996853 3.50

The function can be vectorized, i.e. the input can be a list of multiple gene sets.

eids_sets <- lapply(1:10,function(x) sample(eids_bg,300)) 

goea(gene_set = eids_sets,
     back_ground = eids_bg,
     orgDb = org.Hs.eg.db) %>% summary

##       Length Class      Mode
##  [1,] 6      data.frame list
##  [2,] 6      data.frame list
##  [3,] 6      data.frame list
##  [4,] 6      data.frame list
##  [5,] 6      data.frame list
##  [6,] 6      data.frame list
##  [7,] 6      data.frame list
##  [8,] 6      data.frame list
##  [9,] 6      data.frame list
## [10,] 6      data.frame list

GO slim is a subset of GO terms that can be defined at here.

goea(gene_set = eids_set,
     back_ground = eids_bg,
     orgDb = org.Hs.eg.db,
     interpret_term = T,
     GO_Slim = T)  %>% head(.,10) %>% knitr::kable(.,"markdown")

| term | definition | freq_gs| freq_bg| p| adj_BH| OR| |:-------------|:-------------------------------------|---------:|---------:|----------:|----------:|-----:| | | lipid metabolic process | 25| 187| 0.0141012| 0.5162342| 1.56| | | cell death | 39| 327| 0.0156435| 0.5162342| 1.39| | | anatomical structure development | 83| 848| 0.0683518| 0.9268498| 1.14| | | growth | 17| 140| 0.0852517| 0.9268498| 1.42| | | signal transduction | 77| 795| 0.0988005| 0.9268498| 1.13| | | developmental maturation | 6| 40| 0.1221631| 0.9268498| 1.75| | | sulfur compound metabolic process | 7| 51| 0.1415087| 0.9268498| 1.60| | | pigmentation | 2| 8| 0.1453356| 0.9268498| 2.92| | | cell wall organization or biogenesis | 1| 2| 0.1641360| 0.9268498| 5.83| | | cell cycle | 26| 250| 0.1651823| 0.9268498| 1.21|

you could set EASE_score = TRUE to get a more conservative p value.

For more information of EASE, please see here.

goea(gene_set = eids_sets,
     back_ground = eids_bg, 
     orgDb = org.Hs.eg.db,
     GO_Slim = T,
     EASE_Score = F) %>% lapply(.,function(x)x$p) %>% unlist %>% hist(main = "normal hypergeometric")

goea(gene_set = eids_sets,
         back_ground = eids_bg,
     orgDb = org.Hs.eg.db,
     GO_Slim = T,
     EASE_Score = T) %>% lapply(.,function(x)x$p) %>% unlist %>% hist(main = "EASE score")

with any questions, please contact zhen.wei@xjtlu.edu.cn.

sessionInfo()

## R version 3.4.2 (2017-09-28)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] zh_CN.UTF-8/zh_CN.UTF-8/zh_CN.UTF-8/C/zh_CN.UTF-8/zh_CN.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] GO.db_3.5.0                            
##  [2] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
##  [3] GenomicFeatures_1.30.3                 
##  [4] GenomicRanges_1.30.3                   
##  [5] GenomeInfoDb_1.14.0                    
##  [6] org.Hs.eg.db_3.5.0                     
##  [7] AnnotationDbi_1.40.0                   
##  [8] IRanges_2.12.0                         
##  [9] S4Vectors_0.16.0                       
## [10] Biobase_2.38.0                         
## [11] BiocGenerics_0.24.0                    
## [12] magrittr_1.5                           
## [13] golite_1.0                             
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.16               highr_0.6                 
##  [3] compiler_3.4.2             XVector_0.18.0            
##  [5] prettyunits_1.0.2          bitops_1.0-6              
##  [7] tools_3.4.2                zlibbioc_1.24.0           
##  [9] progress_1.1.2             biomaRt_2.34.2            
## [11] digest_0.6.15              bit_1.1-12                
## [13] lattice_0.20-35            RSQLite_2.0               
## [15] evaluate_0.10.1            memoise_1.1.0             
## [17] pkgconfig_2.0.1            Matrix_1.2-12             
## [19] DelayedArray_0.4.1         DBI_0.8                   
## [21] yaml_2.1.18                GenomeInfoDbData_1.0.0    
## [23] rtracklayer_1.38.3         httr_1.3.1                
## [25] stringr_1.3.0              knitr_1.20                
## [27] Biostrings_2.46.0          grid_3.4.2                
## [29] rprojroot_1.3-2            bit64_0.9-7               
## [31] R6_2.2.2                   BiocParallel_1.12.0       
## [33] XML_3.98-1.10              RMySQL_0.10.14            
## [35] rmarkdown_1.9              blob_1.1.1                
## [37] matrixStats_0.53.1         GenomicAlignments_1.14.2  
## [39] Rsamtools_1.30.0           backports_1.1.2           
## [41] htmltools_0.3.6            SummarizedExperiment_1.8.1
## [43] assertthat_0.2.0           stringi_1.1.7             
## [45] RCurl_1.95-4.10

ZhenWei10/golite documentation built on May 30, 2019, 6:15 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com