translate_ids_multi: Translate gene, protein and small molecule identifiers from...

translate_ids_multiR Documentation

Translate gene, protein and small molecule identifiers from multiple columns

Description

Especially when translating network interactions, where two ID columns exist (source and target), it is convenient to call the same ID translation on multiple columns. The translate_ids function is already able to translate to multiple ID types in one call, but is able to work only from one source column. Here too, multiple target IDs are supported. The source columns can be listed explicitely, or they might share a common stem, in this case the first element of ... will be used as stem, and the column names will be created by adding the suffixes. The suffixes are also used to name the target columns. If no suffixes are provided, the name of the source columns will be added to the name of the target columns. ID types can be defined the same way as for translate_ids. The only limitation is that, if the source columns are provided as stem+suffixes, they must be the same ID type.

Usage

translate_ids_multi(
  d,
  ...,
  suffixes = NULL,
  suffix_sep = "_",
  uploadlists = FALSE,
  ensembl = FALSE,
  hmdb = FALSE,
  chalmers = FALSE,
  entity_type = NULL,
  keep_untranslated = TRUE,
  organism = 9606,
  reviewed = TRUE
)

Arguments

d

A data frame.

...

At least two arguments, with or without names. These arguments describe identifier columns, either the ones we translate from (source), or the ones we translate to (target). Columns existing in the data frame will be used as source columns. All the rest will be considered target columns. Alternatively, the source columns can be defined as a stem and a vector of suffixes, plus a separator between the stem and suffix. In this case, the source columns will be the ones that exist in the data frame with the suffixes added. The values of all these arguments must be valid identifier types as shown at translate_ids. If ID type is provided only for the first source column, the rest of the source columns will be assumed to have the same ID type. For the target identifiers new columns will be created with the desired names, with the suffixes added. If no suffixes provided, the names of the source columns will be used instead.

uploadlists

Force using the 'uploadlists' service from UniProt. By default the plain query interface is used (implemented in uniprot_full_id_mapping_table in this package). If any of the provided ID types is only available in the uploadlists service, it will be automatically selected. The plain query interface is preferred because in the long term, with caching, it requires less download and data storage.

ensembl

Logical: use data from Ensembl BioMart instead of UniProt.

hmdb

Logical: use HMDB ID translation data.

chalmers

Logical: use ID translation data from Chalmers Sysbio GEM.

entity_type

Character: "gene" and "smol" are short symbols for proteins, genes and small molecules respectively. Several other synonyms are also accepted.

keep_untranslated

In case the output is a data frame, keep the records where the source identifier could not be translated. At these records the target identifier will be NA.

organism

Character or integer, name or NCBI Taxonomy ID of the organism (by default 9606 for human). Matters only if uploadlists is FALSE.

reviewed

Translate only reviewed (TRUE), only unreviewed (FALSE) or both (NULL) UniProt records. Matters only if uploadlists is FALSE.

Value

A data frame with all source columns translated to all target identifiers. The number of new columns is the product of source and target columns. The target columns are distinguished by the suffexes added to their names.

See Also

translate_ids

Examples

ia <- import_omnipath_interactions()
translate_ids_multi(ia, source = uniprot, target, ensp, ensembl = TRUE)


saezlab/OmnipathR documentation built on June 17, 2024, 2:24 a.m.