mapUniProt: Mapping identifiers with the UniProt API

View source: R/mapUniProt.R

allFromKeysR Documentation

Mapping identifiers with the UniProt API

Description

These functions are the main workhorses for mapping identifiers from one database to another. They make use of the latest UniProt API (seen at https://www.uniprot.org/help/api).

Usage

allFromKeys()

allToKeys(fromName = "UniProtKB_AC-ID")

returnFields()

mapUniProt(
  from = "UniProtKB_AC-ID",
  to = "UniRef90",
  columns = character(0L),
  query,
  verbose = FALSE,
  debug = FALSE,
  paginate = TRUE,
  pageSize = 500L
)

queryUniProt(
  query = character(0L),
  fields = c("accession", "id"),
  collapse = c("OR", "AND"),
  n = Inf,
  pageSize = 25L
)

Arguments

fromName

character(1) A from key to use as the basis of mapping to other keys, by default, "UniProtKB_AC-ID".

from

character(1) The identifier type to map from, by default "UniProtKB_AC-ID", short for UniProt accession identifiers. See a list of all 'from' type identifiers with allFromKeys.

to

character(1) The target mapping identifier, by default "UniRef90". It can be any one of those returned by allToKeys from the appropriate fromName argument.

columns, fields

character() Additional information to be retreived from UniProt service. See a full list of possible input return fields at https://www.uniprot.org/help/return_fields. Example fields include, "accession", "id", "gene_names", "xref_pdb", "xref_hgnc", "sequence", etc.

query

character() or named list() Typically, a string of ⁠from=⁠ identifiers for ID mapping (mapUniProt). For the uniprotkb/search endpoint (queryUniProt), query can be a string of colon separated key-value pairs (e.g., "organism_id:9606") or a named list of available query fields (queryUniProt). See https://www.uniprot.org/help/query-fields for a list of query fields.

verbose

logical(1) Whether the operations should provide verbose updates (default FALSE).

debug

logical(1) Whether to display the URL API endpoints, for advanced debugging (default FALSE)

paginate

logical(1) Whether to use the pagination API (i.e., "results" vs "stream") in the request responses. For performance, it is set to TRUE by default.

pageSize

integer(1) number of records per page. It corresponds to the size parameter in the API request.

collapse

character(1) A string indicating either "OR" or "AND" for combining query clauses (case-insensitive).

n

numeric(1) Maximum number of rows to return

Details

Note that mapUniProt is used internally by the select method but made available for API queries with finer control. Provide values from the name column in returnFields as the columns input in either mapUniProt or select method.

When using from='Gene_Name', you may restrict the search results to a specific organism by including e.g., taxId=9606 in the query as a named list element. See examples below.

Value

  • mapUniProt: A data.frame of returned results

  • allToKeys: A sorted character vector of possible "To" keytypes based on the given "From" type

  • allFromKeys: A sorted character vector of possible "From" keytypes

  • returnFields: A data.frame of entries for the columns input in mapUniProt; see 'name' column

Author(s)

M. Ramos

Examples


mapUniProt(
    from="UniProtKB_AC-ID",
    to='RefSeq_Protein',
    query=c('P13368','Q9UM73','P97793','Q17192')
)

mapUniProt(
    from='GeneID', to='UniProtKB', query=c('1','2','3','9','10')
)

mapUniProt(
    from = "UniProtKB_AC-ID",
    to = "UniProtKB",
    columns = c("accession", "id"),
    query = list(organism_id = 10090, ids = c('Q7TPG8', 'P63318'))
)

## restrict 'from = Gene_Name' result to taxId 9606
mapUniProt(
    from = "Gene_Name",
    to = "UniProtKB-Swiss-Prot",
    columns = c("accession", "id"),
    query = list(taxId = 9606, ids = 'TP53')
)

mapUniProt(
    from = "UniProtKB_AC-ID", to = "UniProtKB",
    columns = c("accession", "id", "xref_pdb", "xref_hgnc", "sequence"),
    query = c("P31946", "P62258")
)

## query as character
queryUniProt(
    query = c("accession:A5YMT3", "organism_id:9606"),
    fields = c("accession", "id", "reviewed"),
    collapse = "AND"
)

## query as list
queryUniProt(
    query = list(organism_id = 9606, gene_exact = "A2M"),
    fields = c(
        "id", "accession", "gene_primary",
        "organism_name", "protein_name", "reviewed"
    ),
    collapse = "OR", n = 3, pageSize = 3
)

allToKeys(fromName = "UniRef100")

head(allFromKeys())

head(returnFields())


Bioconductor/UniProt.ws documentation built on June 14, 2025, 5:45 p.m.