scan_nglc: Detection of motifs for N-glycosylation on asparagine...
In missuse/ragp: Mining for Hydroxyproline rich glycoprotein sequences

Description Usage Arguments Value Note Author(s) Source References Examples

Detection is based on PROSITE pattern PS00001. Mean local hydrophilicity (Hopp and Woods, 1981) is used to assess if the asparagines are buried.

scan_nglc(data, ...)

## S3 method for class 'character'
scan_nglc(data, ...)

## S3 method for class 'data.frame'
scan_nglc(data, sequence, id, ...)

## S3 method for class 'list'
scan_nglc(data, ...)

## Default S3 method:
scan_nglc(data = NULL, sequence, id, span = 5L, cutoff = 0, nsp = 15L, ...)

## S3 method for class 'AAStringSet'
scan_nglc(data, ...)

`data`	A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class `SeqFastaAA` resulting from `read.fasta` call. Alternatively an `AAStringSet` object. Should be left blank if vectors are provided to sequence and id arguments.
`...`	currently no additional arguments are accepted apart the ones documented bellow.
`sequence`	A vector of strings representing protein amino acid sequences, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
`id`	A vector of strings representing protein identifiers, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
`span`	An integer specifying how many amino acids around the target asparagine residues is used to calculate hydrophilicity. At default set to 5: asparagine position - 5 to asparagine position +5 residues. Range to consider: 3 - 10. Acceptable values are 0 - 20.
`cutoff`	An integer specifying the cutoff value for hydrophilicity. Range to consider: -1 - 1. Values lower then -3.4 exclude hydrophilicity as a parameter, while values higher than 3 result in no motifs being found.
`nsp`	An integer, either of length 1 or length equal the number of sequences, specifying the number of N-terminal amino acids to exclude.

A data frame with columns:

id: Character, as supplied in the function call.
align_start: Integer, start of motif match.
motif: Character, the motif matched.
hydrophilicity: the average hydrophilicity.
is.nglc: Boolean, is the N-glycosylation likely based on hydrophilicity.
nsp: Optional integer column provided when nsp argument is of equal length to the number of input sequences

For N-glycosylation to happen the protein must enter the endoplasmic reticulum. Please check if the proteins are likely to contain a N-terminal signal peptide. The motif Asp-Xaa-Ser/Thr (where Xaa is not Pro) on which N-glycosylation occurs is relatively common, however for N-glycosylation to occur the motif needs to be located on the protein surface. Mean local hydrophilicity (Hopp and Woods, 1981) is used here to evaluate if the asparagines are located in a hydrophilic surrounding which is more likely on the protein surface.

original R code by Thomas Shafee, modified by Milan Dragićević

https://prosite.expasy.org/PDOC00001

Hopp TP. Woods KR. (1981) Prediction of protein antigenic determinants from amino acid sequences. Proceedings of the National Academy of Sciences of the United States of America, 78(6): 3824-8

library(ragp)
data(at_nsp)

nglc_pred <- scan_nglc(data = at_nsp,
                       sequence = sequence,
                       id = Transcript.id)