README.md

genekitr: Gene Analysis Toolkit in R

This tool annotates genes with alias, symbol, full name, function also related papers.

目的就是:与基因id相关的操作(如转换、可视化交集等)、分析(如富集分析),都可以加进来(后期考虑加入网页版)

Table of Contents

Installation

You can also install devel version of genekitr from github with:

# install.packages("remotes")
remotes::install_github("GangLiLab/genekitr")

If you want to build vignette in local, please add two options:

remotes::install_github("GangLiLab/genekitr", build_vignettes = TRUE, dependencies = TRUE)

Features

信息获取 (Search)

数据整理与转换(Tidy & Trans)

数据分析(Analyse)

可视化(Visualize)

导出结果 (Export)

Plans

信息获取 (Search)

同时也发现一个很有趣的事情:标准命名人类的HGNC和小鼠的MGI都是以Ensembl数据库中的alias为准,而genecards用的是ncbi gene数据库的alias,为啥呢?其实看它们的创建国就知道了:GeneCards,是由以色列威兹曼研究院和美国Lifemap 生物科技有限公司;HGNC是EBI和剑桥大学联合;MGI是Jackson Lab,位于美国,但它比较倾向于Ensembl

不过这两个我都加入了genInfo中,分别是ensembl_aliasncbi_alias

数据整理与转换(Tidy & Trans)
数据分析(Analyse)
可视化(Visualize)
导出结果 (Export)

DEBUG

Let's mining data!

Example gene ID

mm_id =c('Ticam2
Arhgap33os
Insl3
Myo15
Gal3st2b
Bloc1s1') 
mm_id=str_split(mm_id,"\n")[[1]]

Method 1: All things about gene ID

# in this example, BCC7 is the alias of TP53; SXHFJG is a fake name
hg_id = c("MCM10",  "CDC20",  "S100A9", "FOXM1",  "KIF23",  "MMP1",   "CDC45",  "BCC7" ,  "SXHFJG", "TP53"  )
genInfo(hg_id, org = 'human')

Next, let's test mouse id:

# here, some mouse id are not standard name, like histone genes (H1-0, H1-1...)
# but we can still get matched standard symbol name
# Besides, we add two fakegenes to test the robustness
mm_id = c("Gtpbp4", "Gtpbp8", "Gtse1" , "Gys1","H1-0", "H1-1" ,"H1-2" ,"H1-3" ,"Fakegene1","H1-4",
          "H1-5" ,"H1-6","H2ac21","H2ax","H2az2","Fakegene2")
genInfo(mm_id, org = 'mouse')

Method 2: Search pubmed

test2=genPubmed(mm_id, keywords = 'stem cell', field = 'tiab')
# Search example: Ticam2 [TIAB] AND stem cell [TIAB] 

# or use much specific keyword
genPubmed(mm_id, keywords = 'stem cell AND epithelial', field = 'tiab')

Method 3: GSEA

R # 加载示例数据 data(geneList, package="DOSE") # 获得msigdb的gene set msigdb <- getMsigdb(org='human', category='C3',subcategory = 'TFT:GTRD') # 直接进行gsea egmt <- genGSEA(genelist = geneList,geneset = msigdb) # 如果是extrez id,可以用下面的函数将id变成symbol egmt2 <- DOSE::setReadable(egmt, OrgDb = org.Hs.eg.db, keyType = 'ENTREZID')

# 加载示例数据
data(geneList, package="DOSE")
# 直接进行gsea
genGSEA(genelist = geneList,org = 'human', category='C3',subcategory = 'TFT:GTRD',use_symbol = F)

Method 4: GO

data(geneList, package="DOSE")
id = names(geneList)[1:100]
ego = genGO(id, org = 'human',ont = 'mf',pvalueCutoff = 0.05,qvalueCutoff = 0.1 ,use_symbol = T)

不知道物种名称?别怕!

biocOrg_name()
# full_name short_name
# 1   anopheles         ag
# 2      bovine         bt
# 3        worm         ce
# 4      canine         cf
# 5         fly         dm
# 6   zebrafish         dr
# 7    ecolik12      eck12
# 8  ecolisakai    ecSakai
# 9     chicken         gg
# 10      human         hs
# 11      mouse         mm
# 12     rhesus        mmu
# 13      chipm         pt
# 14        rat         rn
# 15        pig         ss
# 16    xenopus         xl

Method 5: Transform gene id

Get 1-vs-1 matched id, even input gene is alias, duplicated or even WRONG! User just need to get the output (character) to do other analysis (exp. enrichment analysis)

library(AnnoGenes)
data(geneList, package = 'DOSE')
id = names(geneList)[1:5]
id
# "4312"  "8318"  "10874" "55143" "55388"

## ANY organism alias name and ANY trans_to argument name (exp. "ens", "ensem", "ensembl" are ok to function)
transId(id, trans_to = 'symbol',org='hs')
transId(id, trans_to = 'uni',org='human')
transId(id, trans_to = 'ens',org='hg')

If there are some ID could not transform to another type (like "type ERROR ID", "entrez ID has NO matched id"), the output will show as NA, while still keep the same order with the input

# the id "23215326", "344263475" and "45" are fake, while others are real
fake_id = c(names(geneList)[1:5], '23215326','1','2','344263475','45')

# the gene order of output and input is identical!
transId(fake_id, trans_to = 'sym',org='human')
# 70% genes are mapped from entrezid to symbol
# [1] "MMP1"  "CDC45" "NMU"   "CDCA8" "MCM10" NA      "A1BG"  "A2M"   NA      NA 

However, if user provides wrong orgnism, the function will report error...

## try to trans human id to symbol, but choose wrong org (mouse)
transId(id, trans_to = 'sym',org='mouse')
# Error in .gentype(id, org) : Wrong organism! 

Compare genekitr::transId and clusterProfiler::bitr

保留原始id顺序,而没有去除NA,就是为了方便用户看到哪些ID没有转换成功; 如果用户后面需要去掉NA,那么直接使用na.omit()即可

fake_id = c(names(geneList)[1:5], '23215326','1','2','344263475','45')
res1 = transId(fake_id, trans_to = 'sym',org='human')
res2 = clusterProfiler::bitr(fake_id, fromType = 'ENTREZID',
                      toType = 'SYMBOL', OrgDb = org.Hs.eg.db)

Change another organism:

library(org.Dm.eg.db)
id = toTable(org.Dm.egSYMBOL) %>% dplyr::pull(1) %>% sample(20)
transId(id, trans_to = 'ens',org='fly')
transId(id, trans_to = 'symbol',org='fly')

不管是参数的简洁,还是结果的准确性,都胜过其他两大R包clusterProfiler::bitrbiomaRt::getBM

Method 6: KEGG

ids = names(geneList)[1:100]
gkeg <- genKEGG(ids, org = 'human') 
# org可以是common name(如human、mouse),也可以是hg、hs等常见的称呼

# 默认支持readable 参数,结果以symbol name展示
keg_raw <- genKEGG(test, org = 'hs', use_symbol = F)
keg_readable <- genKEGG(test, org = 'hs', use_symbol = T)
# 差别就是:

换个物种试试~理论上,拿任意物种的symbol、entrez、ensembl基因,给函数投食即可。不需要再提前进行id转换了

# 小鼠基因为例
head(id)
[1] "Adora1"    "Insl3"    "AF067061"      "Alpk1"         "Arhgap20"      "B020004J07Rik" "Bmp6"
keg <- genKEGG(mm_id, org = 'mouse', use_symbol = T, pvalueCutoff = 1, qvalueCutoff = 1, maxGSSize = 3000)

Let's plot!

P1: Enrichment dotplot => plotEnrichDot

# First, feed any dataframe result to enrichDat 
test = as.enrichDat(test)
ego = as.enrichDat(ego)

# Second, easy plot
p1 = plotEnrichDot(test,legend_by = 'qvalue'))
p2 = plotEnrichDot(ego)

# Third, if you want to change more on plot...
# test dataframe was from GeneOntology panther web result
p3 = plotEnrichDot(test, xlab_type =  'FoldEnrich', legend_by = 'qvalue',
              show_item = 15, main_text_size = 14,legend_text_size = 10,
              low_color = 'red', high_color = 'blue',
              xleft = 0, font_type = 'Arial', remove_grid = T,
              wrap_width = 30,border_thick = 3 )

# ego dataframe was from clusterP result
p4=plotEnrichDot(ego, xlab_type =  'GeneRatio', legend_by = 'p.adjust',
                 show_item = 10, main_text_size = 14,legend_text_size = 10,
                 low_color = 'orange', high_color = 'green',
                 xleft = 0, font_type = 'Times New Roman', remove_grid = F,
                 wrap_width = NULL ,border_thick = 1)

library(patchwork)
wrap_plots(list(p1,p2,p3,p4))+ plot_layout(ncol = 2) + 
  plot_annotation(tag_levels = 'a')

P2: Venn plot => plotVenn

library(AnnoGenes)
library(dplyr)
library(patchwork)

set1 <- paste(rep("gene" , 100) , sample(c(1:1000) , 100 , replace=F) , sep="")
set2 <- paste(rep("gene" , 100) , sample(c(1:1000) , 100 , replace=F) , sep="")
set3 <- paste(rep("gene" , 100) , sample(c(1:1000) , 100 , replace=F) , sep="")
set4 <- paste(rep("gene" , 100) , sample(c(1:1000) , 100 , replace=F) , sep="")
set5 <- paste(rep("gene" , 100) , sample(c(1:1000) , 100 , replace=F) , sep="")

two_gene_list = list(gset1 = set1, gset2 = set2)
sm_gene_list = list(gset1 = set1, gset2 = set2, gset3 = set3)
la_gene_list = list(gset1 = set1, gset2 = set2, gset3 = set3, gset4 = set4, gset5 = set5 )

p1 = plotVenn(two_gene_list, alpha_degree = 1, border_thick = 0)

p2= plotVenn(sm_gene_list,alpha_degree = .3, border_thick = 1)
p3 = plotVenn(sm_gene_list,text_size = 2,alpha_degree = 1,
              remove_grid = T, color = ggsci::pal_lancet()(3))

p4 = plotVenn(la_gene_list,use_venn = F,
              text_size = 10, border_thick = 2,remove_grid = T)

(p1+p2+p3)/p4

Export

library(openxlsx)
expoSheet(dat_list =  list(mtcars,ToothGrowth,iris), name_list = list('mtcars','tooth','IRIS'),
  filename = 'test.xlsx')

References

Things about GSEA



Try the genekitr package in your browser

Any scripts or data that you put into this service are public.

genekitr documentation built on Sept. 11, 2021, 9:09 a.m.