idMap: Mapping between gene ID types

View source: R/mapIds.R

idMapR Documentation

Mapping between gene ID types

Description

Functionality to map between common gene ID types such as ENSEMBL and ENTREZ for gene expression datasets, gene sets, and gene regulatory networks.

Usage

idMap(
  obj,
  org = NA,
  from = "ENSEMBL",
  to = "ENTREZID",
  multi.to = "first",
  multi.from = "first"
)

idTypes(org)

Arguments

obj

The object for which gene IDs should be mapped. Supported options include

  • Gene expression dataset. An object of class SummarizedExperiment. Expects the names to be of gene ID type given in argument from.

  • Gene sets. Either a list of gene sets (character vectors of gene IDs) or a GeneSetCollection storing all gene sets.

  • Gene regulatory network. A 3-column character matrix; 1st col = IDs of regulating genes; 2nd col = IDs of regulated genes; 3rd col = regulation effect; Use '+' and '-' for activation / inhibition.

org

Character. Organism in KEGG three letter code, e.g. ‘hsa’ for ‘Homo sapiens’. See references.

from

Character. Gene ID type from which should be mapped. Corresponds to the gene ID type of argument obj. Defaults to ENSEMBL.

to

Character. Gene ID type to which should be mapped. Corresponds to the gene ID type the argument obj should be updated with. If obj is an expression dataset of class SummarizedExperiment, to can also be the name of a column in the rowData slot to specify user-defined mappings in which conflicts have been manually resolved. Defaults to ENTREZID.

multi.to

How to resolve 1:many mappings, i.e. multiple to.IDs for a single from.ID? This is passed on to the multiVals argument of mapIds and can thus take several pre-defined values, but also the form of a user-defined function. However, note that this requires that a single to.ID is returned for each from.ID. Default is "first", which accordingly returns the first to.ID mapped onto the respective from.ID.

multi.from

How to resolve many:1 mappings, i.e. multiple from.IDs mapping to the same to.ID? Only applicable if obj is an expression dataset of class SummarizedExperiment. Pre-defined options include:

  • 'first' (Default): returns the first from.ID for each to.ID with multiple from.IDs,

  • 'minp': selects the from.ID with minimum p-value (according to the rowData column PVAL of obj),

  • 'maxfc': selects the from.ID with maximum absolute log2 fold change (according to the rowData column FC of obj).

Note that a user-defined function can also be supplied for custom behaviors. This will be applied for each case where there are multiple from.IDs for a single to.ID, and accordingly takes the arguments ids and obj. The argument ids corresponds to the multiple from.IDs from which a single ID should be chosen, e.g. via information available in argument obj. See examples for a case where ids are selected based on a user-defined rowData column.

Details

The function 'idTypes' lists the valid values which the arguments 'from' and 'to' can take. This corresponds to the names of the available gene ID types for the mapping.

Value

idTypes: character vector listing the available gene ID types for the mapping;

idMap: An object of the same class as the input argument obj, i.e. a SummarizedExperiment if provided an expression dataset, a list of character vectors or a GeneSetCollection if provided gene sets, and a character matrix if provided a gene regulatory network.

Author(s)

Ludwig Geistlinger

References

KEGG Organism code http://www.genome.jp/kegg/catalog/org_list.html

See Also

SummarizedExperiment, mapIds, keytypes

Examples


    # (1) ID mapping for gene expression datasets 
    # create an expression dataset with 3 genes and 3 samples
    se <- makeExampleData("SE", nfeat = 3, nsmpl = 3)
    names(se) <- paste0("ENSG00000000", c("003", "005", "419"))
    idMap(se, org = "hsa")

    # user-defined mapping
    rowData(se)$MYID <- c("g1", "g1", "g2")
    idMap(se, to = "MYID")    

    # data-driven resolving of many:1 mappings
    
    ## e.g. select from.ID with lowest p-value
    pcol <- configEBrowser("PVAL.COL")
    rowData(se)[[pcol]] <- c(0.001, 0.32, 0.15)
    idMap(se, to = "MYID", multi.from = "minp") 
   
    ## ... or using a customized function
    maxScore <- function(ids, se)
    {
         scores <- rowData(se)[ids, "SCORE"]
         ind <- which.max(scores)
         return(ids[ind])
    }
    rowData(se)$SCORE <- c(125.7, 33.4, 58.6)
    idMap(se, to = "MYID", multi.from = maxScore) 
           
    # (2) ID mapping for gene sets 
    # create two gene sets containing 3 genes each 
    s2 <- paste0("ENSG00000", c("012048", "139618", "141510"))
    gs <- list(s1 = names(se), s2 = s2)
    idMap(gs, org = "hsa", from = "ENSEMBL", to = "SYMBOL")    

    # (3) ID mapping for gene regulatory networks
    grn <- cbind(FROM = gs$s1, TO = gs$s2, TYPE = rep("+", 3))
    idMap(grn, org = "hsa", from = "ENSEMBL", to = "ENTREZID")  


lgeistlinger/EnrichmentBrowser documentation built on May 9, 2024, 7:22 p.m.