match_id_: Match concatenated IDs in vector

View source: R/match_id.R

match_id_R Documentation

Match concatenated IDs in vector

Description

Match IDs in a vector to a reference table and then take other columns from that reference table and output them as a list of vectors. I mainly use this for matching metadata to UniProt accessions, but will work for any type of string ID e.g. Ensembl IDs.

Usage

match_id_(
  to_match,
  ref,
  match,
  new,
  regex = "[^;]+",
  collapse = ";",
  simplify = FALSE,
  verbose = FALSE
)

Arguments

to_match

character vector. IDs to be matched.

ref

data.frame. A reference data.frame with the IDs to use for matching and the new columns to output as a list.

match

string. Name of column in ref to use for matching.

new

character vector. Name of column(s) in ref to output. If not character columns, they will be coerced to class character with a warning, unless verbose = FALSE.

regex

string. Regular expression to use for extracting the IDs from the to_match vector.

collapse

string. String to collapse multiple matched IDs.

simplify

logical. Should the output list be unlisted? Default is FALSE.

verbose

logical. Show warning if new columns have been coerced to character columns. Default is FALSE.

Value

Returns a list of named vectors unless simplify = TRUE wherein a named vector is returned.

See Also

match_id() which takes a data.frame as input instead.

Examples

ref_df <- data.frame(
  accession = c("AAA111", "BBB222", "CCC333", "DDD444"),
  name = c("protein a", "protein b", "protein c", "protein d"),
  value = c(11, 22, 33, 44)
)

my_vec <- c("AAA111", "CCC333;BBB222", "EEE555")

my_df2 <- match_id_(
  my_vec,
  ref_df,
  "accession",
  c("name", "value")
)

csdaw/csdmisc documentation built on April 26, 2022, 5:39 a.m.