separateSubunits: Separate multi-subunit protein names

View source: R/separateSubunits.R

separateSubunitsR Documentation

Separate multi-subunit protein names

Description

Separate names of antibodies against multi-subunit proteins e.g. CD235ab, CD66ace into one subunit per row.

Two subunit patterns are considered. For the first, subunits are lower case letters and the gene name has no separator, e.g. CD66ace is composed of subunits CD66a, CD66b and CD66c. For the second pattern, subunits are written with uppercase letters and are separated with a "-", e.g. HLA-A/C/E is composed of subunits HLA-A, HLA-C and HLA-E. Both patterns require at least at least 2 capital letters or numbers followed by at least 2 possible subunits. There may be a separator between the groups and/or between the lower case letters. At present, the between group separators are -, . and space, and the between subunit separators are / and .

Subunits should be converted from Greek symbols before applying this function.

At present user-supplied regex patterns are not supported

Usage

separateSubunits(df, ab = "Antigen", new_col = "subunit")

Arguments

df

A data.frame or tibble

ab

(character(1), default "Antigen) Name of the column containing antibody names

new_col

(default: subunit) Name of new column containing guesses for single subunit names

Value

df, with a new column "subunit" containing potential individual subunits. Original rows of df are replicated for each subunit, i.e. the returned data.frame is in long format.

Author(s)

Helen Lindsay

Examples

df <- data.frame(ID = LETTERS[1:5],
                Antigen = c("CD235a/b", "CD235ab",
                            "HLA-ABC", "HLA-DR", "TCR alpha/beta"))

#Note that in this example, the TCR is not split as "alpha/beta" is too long
#to match the splitting pattern.  Also note that HLA-DR is split - this
#function doesn't check whether the results are real protein subunits.
separateSubunits(df)

HelenLindsay/AbNames documentation built on June 6, 2023, 1:18 p.m.