dot-separateSubunits: Separate multi-subunit protein names into one name per row

.separateSubunitsR Documentation

Separate multi-subunit protein names into one name per row

Description

Separates a column at a provided pattern, separates the second second into individual letters and joins segments with a provided joining pattern

Usage

.separateSubunits(df, ab, new_col, pattern, join_pattern, t1, t2)

Arguments

df

A data.frame or tibble

ab

The name of the column containing names to split

new_col

The name of the new column containing split names

pattern

The regex pattern to use for splitting

join_pattern

sprintf pattern for joining t1 (start) and t2 (end)

t1

first temporary column name

t2

second temporary column name

Value

df containing an extra column "new_col" with the results of applying the regular expression "pattern" to column "ab" to identify potential subunits, then transforming to long format with one subunit per row.

Examples

df <- data.frame(Antigen = c("CD235a/b", "HLA-A,B,C", "TCR g/d"))

#This is the first pattern used by separateSubunits
p1 <- "^[A-Z0-9]{2,}[-\\. ]?([a-z\\/\\.]{2,6})$"

.separateSubunits(df, "Antigen", "Subunit", p1, "%s%s", "TEMP", "TEMP2")

HelenLindsay/AbNames documentation built on June 6, 2023, 1:18 p.m.