sharedSubstr: Group based on shared substrings
In HelenLindsay/AbNames: Standardize Antibody Names

sharedSubstr

R Documentation

Group based on shared substrings

Description

Takes a vector of words and a corresponding vector of group ids and returns a numeric vector indicating groupings where members of a group share a word with at least one other member of the group.

Usage

sharedSubstr(df, x = "value", id = "ID", new_col = "AB_group")

Arguments

`df`	A query data.frame, e.g. created by makeQueryTable
`x`	Name of column to check for shared substrings (character(1), default: "value")
`id`	Name of ID column uniquely identifying rows (character(1), default: "ID")
`new_col`	Name of column to be added to df (character(1), default: "AB_group")

Details

The intention is that if an antibody has alternative names and is sometimes called by both, e.g. CD274 (B7-H1), we would like to group all entries matching CD274 with all entries matching B7-H1.

Although this function can be useful for matching antibody names, in our experience manual checking of the results is required.

HelenLindsay/AbNames documentation built on June 6, 2023, 1:18 p.m.