sharedSubstr: Group based on shared substrings

View source: R/sharedSubstr.R

sharedSubstrR Documentation

Group based on shared substrings

Description

Takes a vector of words and a corresponding vector of group ids and returns a numeric vector indicating groupings where members of a group share a word with at least one other member of the group.

Usage

sharedSubstr(df, x = "value", id = "ID", new_col = "AB_group")

Arguments

df

A query data.frame, e.g. created by makeQueryTable

x

Name of column to check for shared substrings (character(1), default: "value")

id

Name of ID column uniquely identifying rows (character(1), default: "ID")

new_col

Name of column to be added to df (character(1), default: "AB_group")

Details

The intention is that if an antibody has alternative names and is sometimes called by both, e.g. CD274 (B7-H1), we would like to group all entries matching CD274 with all entries matching B7-H1.

Although this function can be useful for matching antibody names, in our experience manual checking of the results is required.


HelenLindsay/AbNames documentation built on June 6, 2023, 1:18 p.m.