locateMod: Locate And Extract Modifications Or Monoisotopic Masses From...
In akin: Functional Utilities for Data Processing

locateMod

R Documentation

Locate And Extract Modifications Or Monoisotopic Masses From A Modified Peptide

Description

Finds and tabulates amino acid modification sites and extracts modifications or monoisotopic masses from modified peptide data representation.

Usage

locateMod(string, wrap = "]", inbracket = ")", except = NULL, rmve = NULL)

Arguments

`string`	character, length 1. Modified or unmodified peptide, or NULL
`wrap`	character, length 1. The closing (right-hand) side of any of the bracket types ']', ')', '}' that wrap the modifications, such as in protein mass spectrometry data representation of modified peptides. Default, ']'
`inbracket`	character, length 1. Same as above for brackets used inside modification wrappings. Default, ')'
`except`	character, length >= 1. Default, NULL. Punctuation marks or characters that appear along modifications and are needed to remain present in the output: '-', '+', ',', ';', ':', '=', '.', `⁠[:digit:]⁠`, `⁠[:alpha:]⁠`, '\w+', ' '
`rmve`	character, length 1. Default, NULL. Regular expression. Digits or extra characters that need to be removed from the output (see Examples)

Details

Although capable of handling most situations, it is recommended that the wrapping bracket type remains consistent throughout and the inbracket type be different from wrapping type. No extra characters are removed from result when except = rmve = NULL.

This utility covers most data representation styles for modified peptide. However, clean data results are not guaranteed. The template for letter casing accepted for modified peptide and for modifications should match those presented in Examples: upper case for peptide and mixed case for modifications.

Value

A 'data.table' class data frame containing the unmodified peptide, the modified peptide, the modification site (i.e. the amino acid code letter and location inside the peptide) and the associated modification(s). In case of monoisotopic mass extraction, monoisotopic mass values populate column "Modification" as "character" types. Multiple modifications (identical or not) found at the same site are listed as many times as they appear at that site. Unmodified, endogenous peptides are listed with no other information. Empty strings are listed as such with a warning.

Examples


if (interactive()) {

# Completely made-up modified peptides:

# 1. Modifications

# 1.1 Default brackets
string = 'K[Prop_A][Met][Prop (C)]PSSABCELR[Prop][Prop][Prop]FQC[Carba (C)]GQQ[Met +44]TARP'

a = locateMod(string)
print(a)                                                              # with extra-characters
b = locateMod(string, except = '\\w+', rmve = '(\\(.*\\)|_[A-Z]|[0-9])')
print(b)                                                              # without extra-characters

# In this example argument "rmve" contains the default in-brackets

# 1.2 Alternative bracketing

string = 'K{Prop_A}{Met}{Prop [A]}PSSABCELR{Prop +15}{Prop}{Prop}FQC{Carba [C]}GQQ{Met +44}TARP'

c = locateMod(string, '}', ']')
print(c)
d = locateMod(string, '}', ']', except = '\\w+', rmve = '(\\[.*\\]|_[A-Z]|[0-9])')
print(d)

# In this example argument "rmve" contains the alternative in-brackets

# 2. Empty string

empty = locateMod(""); print(empty)

# 3. Monoisotopic masses

string = 'TAAC[+57.021464]PPC[+57.021464]PAPPAPS[+162.052824]VFLTLMISR'
e = locateMod(string)
print(e)                                                             # with extra-characters
f = locateMod(string, rmve = '[[:punct:]]')$Modification
print(f)                                                             # incorrect values
g = locateMod(string, rmve = '\\+')$Modification
print(g)                                                             # correct!
class(g)                                                             # character
}

akin documentation built on May 19, 2026, 5:07 p.m.