addGeneIDs: Add common IDs to annotated peaks such as gene symbol, entrez...

View source: R/addGeneIDs.R

addGeneIDsR Documentation

Add common IDs to annotated peaks such as gene symbol, entrez ID, ensemble gene id and refseq id.

Description

Add common IDs to annotated peaks such as gene symbol, entrez ID, ensemble gene id and refseq id leveraging organism annotation dataset. For example, org.Hs.eg.db is the dataset from orgs.Hs.eg.db package for human, while org.Mm.eg.db is the dataset from the org.Mm.eg.db package for mouse.

Usage

addGeneIDs(
  annotatedPeak,
  orgAnn,
  IDs2Add = c("symbol"),
  feature_id_type = "ensembl_gene_id",
  silence = TRUE,
  mart
)

Arguments

annotatedPeak

GRanges or a vector of feature IDs.

orgAnn

organism annotation dataset such as org.Hs.eg.db.

IDs2Add

a vector of annotation identifiers to be added

feature_id_type

type of ID to be annotated, default is ensembl_gene_id

silence

TRUE or FALSE. If TRUE, will not show unmapped entrez id for feature ids.

mart

mart object, see useMart of biomaRt package for details

Details

One of orgAnn and mart should be assigned.

  • If orgAnn is given, parameter feature_id_type should be ensemble_gene_id, entrez_id, gene_symbol, gene_alias or refseq_id. And parameter IDs2Add can be set to any combination of identifiers such as "accnum", "ensembl", "ensemblprot", "ensembltrans", "entrez_id", "enzyme", "genename", "pfam", "pmid", "prosite", "refseq", "symbol", "unigene" and "uniprot". Some IDs are unique to an organism, such as "omim" for org.Hs.eg.db and "mgi" for org.Mm.eg.db.

    Here is the definition of different IDs :

    • accnum: GenBank accession numbers

    • ensembl: Ensembl gene accession numbers

    • ensemblprot: Ensembl protein accession numbers

    • ensembltrans: Ensembl transcript accession numbers

    • entrez_id: entrez gene identifiers

    • enzyme: EC numbers

    • genename: gene name

    • pfam: Pfam identifiers

    • pmid: PubMed identifiers

    • prosite: PROSITE identifiers

    • refseq: RefSeq identifiers

    • symbol: gene abbreviations

    • unigene: UniGene cluster identifiers

    • uniprot: Uniprot accession numbers

    • omim: OMIM(Mendelian Inheritance in Man) identifiers

    • mgi: Jackson Laboratory MGI gene accession numbers

  • If mart is used instead of orgAnn, for valid parameter feature_id_type and IDs2Add parameters, please refer to getBM in bioMart package. Parameter feature_id_type should be one valid filter name listed by listFilters(mart) such as ensemble_gene_id. And parameter IDs2Add should be one or more valid attributes name listed by listAttributes(mart) such as external_gene_id, entrezgene, wikigene_name, or mirbase_transcript_name.

Value

GRanges if the input is a GRanges or dataframe if input is a vector.

Author(s)

Jianhong Ou, Lihua Julie Zhu

References

http://www.bioconductor.org/packages/release/data/annotation/

See Also

getBM, AnnotationDb

Examples

data(annotatedPeak)
library(org.Hs.eg.db)
addGeneIDs(annotatedPeak[1:6,],orgAnn="org.Hs.eg.db",
           IDs2Add=c("symbol","omim"))
##addGeneIDs(annotatedPeak$feature[1:6],orgAnn="org.Hs.eg.db",
##           IDs2Add=c("symbol","genename"))
if(interactive()){
  mart <- useMart("ENSEMBL_MART_ENSEMBL",host="www.ensembl.org",
                  dataset="hsapiens_gene_ensembl")
  ##mart <- useMart(biomart="ensembl",dataset="hsapiens_gene_ensembl")
  addGeneIDs(annotatedPeak[1:6,], mart=mart,
             IDs2Add=c("hgnc_symbol","entrezgene"))
}

jianhong/ChIPpeakAnno documentation built on Nov. 1, 2024, 8:55 a.m.