Install the package with:
library(devtools)
install_github("ZhenWei10/golite")
The necessary input of goea should be the gene set gene ID, background gene ID, and the orgDb object.
library(golite)
library(magrittr)
library(org.Hs.eg.db)
Use randomly sampled gene IDs for gene set and background.
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
all_eids_hg19 <- names(genes(txdb))
set.seed(1)
eids_bg <- sample(all_eids_hg19, 3500)
eids_set <- sample(eids_bg,300)
Run GO enrichment analysis given gene set and background.
goea(gene_set = eids_set,
back_ground = eids_bg,
orgDb = org.Hs.eg.db,
interpret_term = T) %>% head(.,10) %>% knitr::kable(.,"markdown")
## Loading required package: GO.db
term
definition
freq_gs
freq_bg
p
adj_BH
OR
GO:0009967
positive regulation of signal transduction
29
191
0.0012078
0.5996853
1.77
GO:0010647
positive regulation of cell communication
32
225
0.0020695
0.5996853
1.66
GO:0023056
positive regulation of signaling
32
225
0.0020695
0.5996853
1.66
GO:0006672
ceramide metabolic process
3
4
0.0023288
0.5996853
8.75
GO:2001235
positive regulation of apoptotic signaling pathway
7
24
0.0030061
0.5996853
3.40
GO:0045859
regulation of protein kinase activity
17
101
0.0045006
0.5996853
1.96
GO:0071900
regulation of protein serine/threonine kinase activity
12
61
0.0045852
0.5996853
2.30
GO:0043408
regulation of MAPK cascade
15
85
0.0047575
0.5996853
2.06
GO:1902531
regulation of intracellular signal transduction
34
256
0.0047870
0.5996853
1.55
GO:0060538
skeletal muscle organ development
6
20
0.0051416
0.5996853
3.50
The function can be vectorized, i.e. the input can be a list
of multiple gene sets.
eids_sets <- lapply(1:10,function(x) sample(eids_bg,300))
goea(gene_set = eids_sets,
back_ground = eids_bg,
orgDb = org.Hs.eg.db) %>% summary
## Length Class Mode
## [1,] 6 data.frame list
## [2,] 6 data.frame list
## [3,] 6 data.frame list
## [4,] 6 data.frame list
## [5,] 6 data.frame list
## [6,] 6 data.frame list
## [7,] 6 data.frame list
## [8,] 6 data.frame list
## [9,] 6 data.frame list
## [10,] 6 data.frame list
GO slim is a subset of GO terms that can be defined at here.
goea(gene_set = eids_set,
back_ground = eids_bg,
orgDb = org.Hs.eg.db,
interpret_term = T,
GO_Slim = T) %>% head(.,10) %>% knitr::kable(.,"markdown")
| term | definition | freq_gs| freq_bg| p| adj_BH| OR| |:-------------|:-------------------------------------|---------:|---------:|----------:|----------:|-----:| | | lipid metabolic process | 25| 187| 0.0141012| 0.5162342| 1.56| | | cell death | 39| 327| 0.0156435| 0.5162342| 1.39| | | anatomical structure development | 83| 848| 0.0683518| 0.9268498| 1.14| | | growth | 17| 140| 0.0852517| 0.9268498| 1.42| | | signal transduction | 77| 795| 0.0988005| 0.9268498| 1.13| | | developmental maturation | 6| 40| 0.1221631| 0.9268498| 1.75| | | sulfur compound metabolic process | 7| 51| 0.1415087| 0.9268498| 1.60| | | pigmentation | 2| 8| 0.1453356| 0.9268498| 2.92| | | cell wall organization or biogenesis | 1| 2| 0.1641360| 0.9268498| 5.83| | | cell cycle | 26| 250| 0.1651823| 0.9268498| 1.21|
you could set EASE_score = TRUE
to get a more conservative p value.
For more information of EASE, please see here.
goea(gene_set = eids_sets,
back_ground = eids_bg,
orgDb = org.Hs.eg.db,
GO_Slim = T,
EASE_Score = F) %>% lapply(.,function(x)x$p) %>% unlist %>% hist(main = "normal hypergeometric")
goea(gene_set = eids_sets,
back_ground = eids_bg,
orgDb = org.Hs.eg.db,
GO_Slim = T,
EASE_Score = T) %>% lapply(.,function(x)x$p) %>% unlist %>% hist(main = "EASE score")
with any questions, please contact zhen.wei@xjtlu.edu.cn.
sessionInfo()
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
##
## locale:
## [1] zh_CN.UTF-8/zh_CN.UTF-8/zh_CN.UTF-8/C/zh_CN.UTF-8/zh_CN.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] GO.db_3.5.0
## [2] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
## [3] GenomicFeatures_1.30.3
## [4] GenomicRanges_1.30.3
## [5] GenomeInfoDb_1.14.0
## [6] org.Hs.eg.db_3.5.0
## [7] AnnotationDbi_1.40.0
## [8] IRanges_2.12.0
## [9] S4Vectors_0.16.0
## [10] Biobase_2.38.0
## [11] BiocGenerics_0.24.0
## [12] magrittr_1.5
## [13] golite_1.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.16 highr_0.6
## [3] compiler_3.4.2 XVector_0.18.0
## [5] prettyunits_1.0.2 bitops_1.0-6
## [7] tools_3.4.2 zlibbioc_1.24.0
## [9] progress_1.1.2 biomaRt_2.34.2
## [11] digest_0.6.15 bit_1.1-12
## [13] lattice_0.20-35 RSQLite_2.0
## [15] evaluate_0.10.1 memoise_1.1.0
## [17] pkgconfig_2.0.1 Matrix_1.2-12
## [19] DelayedArray_0.4.1 DBI_0.8
## [21] yaml_2.1.18 GenomeInfoDbData_1.0.0
## [23] rtracklayer_1.38.3 httr_1.3.1
## [25] stringr_1.3.0 knitr_1.20
## [27] Biostrings_2.46.0 grid_3.4.2
## [29] rprojroot_1.3-2 bit64_0.9-7
## [31] R6_2.2.2 BiocParallel_1.12.0
## [33] XML_3.98-1.10 RMySQL_0.10.14
## [35] rmarkdown_1.9 blob_1.1.1
## [37] matrixStats_0.53.1 GenomicAlignments_1.14.2
## [39] Rsamtools_1.30.0 backports_1.1.2
## [41] htmltools_0.3.6 SummarizedExperiment_1.8.1
## [43] assertthat_0.2.0 stringi_1.1.7
## [45] RCurl_1.95-4.10
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.