# set global chunk options
library(knitr)
library(dplyr)
opts_chunk$set(size='tiny')

Purpose

To show different way to convert from one type of gene/protein IDs to another. The example task is to get UniProt accession and entry names based on gene symbols. Bioconductor has multiple ways to get this done. An example gene list.

genes <- c("TH","GFAP","CLU","SNCA","APP","MAPT")

UniProt.ws package

There is now a key that directly corresponds to gene symbols. However, GENECARDS ID works well for this purpose.

library(UniProt.ws)
up <- UniProt.ws(taxId=9606)
select(up, keys=genes, columns=c("ENTRY-NAME","UNIPROTKB"), keytype="GENECARDS")

org.Hs.eg.db package

The OrgDb packages are centered around Entrez IDs. Thus, first gene symbols need to be converted to Entrez IDs and then to UniProt IDs. The only UniProt IDs available through OrbDb are accessions. An important note is that it contains both reviewed and unreviewed accessions. This can cause a trouble once in a while. For example it will be confusing which accession to pick for the APP gene (at least in the 2017_05 UniProt release).

library(org.Hs.eg.db)
entrez_ids <- org.Hs.egSYMBOL2EG[genes] # AnnDbBimap object
# selecting first Entrez ID in case there are multiple
entrez_ids <- sapply(as.list(entrez_ids), '[', 1)
# selecting first accession in case there are multiple. 
# This is sometimes necessary, but dangerous step as the first one may not be
# the primary ID.
uniprot_ids <- sapply(as.list(org.Hs.egUNIPROT[entrez_ids]), '[', 1)
data.frame(genes, uniprot_ids)

alternative

entrez_ids <- org.Hs.egSYMBOL2EG[genes] %>% as.data.frame
uniprot_ids <- org.Hs.egUNIPROT[entrez_ids$gene_id] %>% as.data.frame
inner_join(entrez_ids, uniprot_ids)

clusterProfiler package

clusterProfiler has a convenience function bitr convienience function that at the backend uses OrgDb. Thus the problem of carrying through unreviewed IDs is inherited from OrgDb.

library(clusterProfiler)
bitr(genes, fromType="SYMBOL", toType="UNIPROT", OrgDb="org.Hs.eg.db")

biomaRt approach

This one may or may not to work. There some persistent troubles due to BioMart site migration.
When it works, it is really great, though.

library(biomaRt)
# listMarts()
ensembl <- useMart("ensembl")
# listDatasets(ensembl)
hsa <- useDataset( "hsapiens_gene_ensembl", mart=ensembl)
# listAttributes(hsa)
# listFilters(hsa)
getBM(attributes= c("hgnc_symbol","uniprotswissprot"), 
              filters=c("hgnc_symbol"), 
              values=genes, 
              mart=hsa)


vladpetyuk/vp.misc documentation built on June 25, 2021, 6:35 a.m.