| locateMod | R Documentation |
Finds and tabulates amino acid modification sites and extracts modifications or monoisotopic masses from modified peptide data representation.
locateMod(string, wrap = "]", inbracket = ")", except = NULL, rmve = NULL)
string |
character, length 1. Modified or unmodified peptide, or NULL |
wrap |
character, length 1. The closing (right-hand) side of any of the bracket types ']', ')', '}' that wrap the modifications, such as in protein mass spectrometry data representation of modified peptides. Default, ']' |
inbracket |
character, length 1. Same as above for brackets used inside modification wrappings. Default, ')' |
except |
character, length >= 1. Default, NULL. Punctuation marks or characters that appear along modifications
and are needed to remain present in the output: '-', '+', ',', ';', ':', '=', '.', |
rmve |
character, length 1. Default, NULL. Regular expression. Digits or extra characters that need to be removed from the output (see Examples) |
Although capable of handling most situations, it is recommended that the wrapping bracket type
remains consistent throughout and the inbracket type be different from wrapping type.
No extra characters are removed from result when except = rmve = NULL.
This utility covers most data representation styles for modified peptide. However, clean data results are not guaranteed. The template for letter casing accepted for modified peptide and for modifications should match those presented in Examples: upper case for peptide and mixed case for modifications.
A 'data.table' class data frame containing the unmodified peptide, the modified peptide, the modification site (i.e. the amino acid code letter and location inside the peptide) and the associated modification(s). In case of monoisotopic mass extraction, monoisotopic mass values populate column "Modification" as "character" types. Multiple modifications (identical or not) found at the same site are listed as many times as they appear at that site. Unmodified, endogenous peptides are listed with no other information. Empty strings are listed as such with a warning.
regex
if (interactive()) {
# Completely made-up modified peptides:
# 1. Modifications
# 1.1 Default brackets
string = 'K[Prop_A][Met][Prop (C)]PSSABCELR[Prop][Prop][Prop]FQC[Carba (C)]GQQ[Met +44]TARP'
a = locateMod(string)
print(a) # with extra-characters
b = locateMod(string, except = '\\w+', rmve = '(\\(.*\\)|_[A-Z]|[0-9])')
print(b) # without extra-characters
# In this example argument "rmve" contains the default in-brackets
# 1.2 Alternative bracketing
string = 'K{Prop_A}{Met}{Prop [A]}PSSABCELR{Prop +15}{Prop}{Prop}FQC{Carba [C]}GQQ{Met +44}TARP'
c = locateMod(string, '}', ']')
print(c)
d = locateMod(string, '}', ']', except = '\\w+', rmve = '(\\[.*\\]|_[A-Z]|[0-9])')
print(d)
# In this example argument "rmve" contains the alternative in-brackets
# 2. Empty string
empty = locateMod(""); print(empty)
# 3. Monoisotopic masses
string = 'TAAC[+57.021464]PPC[+57.021464]PAPPAPS[+162.052824]VFLTLMISR'
e = locateMod(string)
print(e) # with extra-characters
f = locateMod(string, rmve = '[[:punct:]]')$Modification
print(f) # incorrect values
g = locateMod(string, rmve = '\\+')$Modification
print(g) # correct!
class(g) # character
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.