cleanAnnotation: Function for extracting peptide sequences from multimapped or...
In KinSwingR: KinSwingR: network-based kinase activity prediction

Description Usage Arguments Value Examples

View source: R/clean_seqs.r

This function extracts unique peptide:annotation combinations from complex annotated data and formats for further analysis using KinSwingR. For instance, example input annotation may be: "A0A096MIX2|Ddx17|494|RSRYRTTSSANNPN". This function will extract the peptide sequence into a second column and associate it all annotations. See vignette for more details.

cleanAnnotation(input_data = NULL, annotation_delimiter = "|",
  multi_protein_delimiter = ":", multi_site_delimiter = ";",
  seq_number = 4, replace = FALSE, replace_search = "X",
  replace_with = "_", verbose = FALSE)

`input_data`	A data.frame of phosphopeptide data. Must contain 4 columns and the following format must be adhered to. Column 1 - Annotation, Column 2 - centered peptide sequence, Column 3 - Fold Change [-ve to +ve], Column 4 - p-value [0-1]. This will extract the peptide sequences from Column1 and replace all values in Column2 to be used in scoreSequences(). Where peptide sequences have not been extracted from the annotation, leave Column2 as NA's.
`annotation_delimiter`	The character used to delimit annotations. Default="\|"
`multi_protein_delimiter`	The character used to delimit multi-protein assignments. Default=":". E.g. Ddx17:Ddx2
`multi_site_delimiter`	The character used to delimit multi-site assignments. Default=";". E.g. 494;492
`seq_number`	The annotation frame that contains the sequence after delimitation. E.g. The sequence "RSRYRTTSSANNPN" is contained in the 4th annotation frame of the following annotation: "A0A096MIX2\|Ddx17\|494\|RSRYRTTSSANNPN" and would therefore set seq_number=4. Default=4
`replace`	Replace a letter that describes sequences outside of the protein after centering on the phosphosite (e.g X in XXXMERSTRELCLNF). Use in combination with replace_search and replace_with to replace amino acids. Options are "TRUE" or "FALSE". Default="FALSE".
`replace_search`	Amino Acid to search for when replacing sequences. Default="X"
`replace_with`	Amino Acid to replace with when replacing sequences. Default="_"
`verbose`	Print progress to screen. Default=FALSE

A data.table with the peptides extracted from the annotation column

## Extract peptide sequences from annotation data:
data(example_phosphoproteome)

## A0A096MJ61|NA|89|PRRVRNLSAVLAART
## The following will extract all the uniquely annotated peptide
## sequences from the "annotation" column and place these in the
## "peptide" column. Where multi-mapped peptide sequences are input,
## these are placed on a new line.
##
## Here, sequences with a "X" and also replaced with a "_". This is ensure 
## that PWMs are built correctly.

## Sample data for demonstration:
sample_data <- head(example_phosphoproteome)
annotated_data <- cleanAnnotation(input_data = sample_data,
                                   annotation_delimiter = "|",
                                   multi_protein_delimiter = ":",
                                   multi_site_delimiter = ";",
                                   seq_number = 4,
                                   replace = TRUE,
                                   replace_search = "X",
                                   replace_with = "_")

## Return the annotated data with extracted peptides:
head(annotated_data)