scan_nglc: Detection of motifs for N-glycosylation on asparagine...

Description Usage Arguments Value Note Author(s) Source References Examples

Description

Detection is based on PROSITE pattern PS00001. Mean local hydrophilicity (Hopp and Woods, 1981) is used to assess if the asparagines are buried.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
scan_nglc(data, ...)

## S3 method for class 'character'
scan_nglc(data, ...)

## S3 method for class 'data.frame'
scan_nglc(data, sequence, id, ...)

## S3 method for class 'list'
scan_nglc(data, ...)

## Default S3 method:
scan_nglc(data = NULL, sequence, id, span = 5L, cutoff = 0, nsp = 15L, ...)

## S3 method for class 'AAStringSet'
scan_nglc(data, ...)

Arguments

data

A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class SeqFastaAA resulting from read.fasta call. Alternatively an AAStringSet object. Should be left blank if vectors are provided to sequence and id arguments.

...

currently no additional arguments are accepted apart the ones documented bellow.

sequence

A vector of strings representing protein amino acid sequences, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path or list with elements of class "SeqFastaAA" provided to data, this should be left blank.

id

A vector of strings representing protein identifiers, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path or list with elements of class "SeqFastaAA" provided to data, this should be left blank.

span

An integer specifying how many amino acids around the target asparagine residues is used to calculate hydrophilicity. At default set to 5: asparagine position - 5 to asparagine position +5 residues. Range to consider: 3 - 10. Acceptable values are 0 - 20.

cutoff

An integer specifying the cutoff value for hydrophilicity. Range to consider: -1 - 1. Values lower then -3.4 exclude hydrophilicity as a parameter, while values higher than 3 result in no motifs being found.

nsp

An integer, either of length 1 or length equal the number of sequences, specifying the number of N-terminal amino acids to exclude.

Value

A data frame with columns:

id

Character, as supplied in the function call.

align_start

Integer, start of motif match.

motif

Character, the motif matched.

hydrophilicity

the average hydrophilicity.

is.nglc

Boolean, is the N-glycosylation likely based on hydrophilicity.

nsp

Optional integer column provided when nsp argument is of equal length to the number of input sequences

Note

For N-glycosylation to happen the protein must enter the endoplasmic reticulum. Please check if the proteins are likely to contain a N-terminal signal peptide. The motif Asp-Xaa-Ser/Thr (where Xaa is not Pro) on which N-glycosylation occurs is relatively common, however for N-glycosylation to occur the motif needs to be located on the protein surface. Mean local hydrophilicity (Hopp and Woods, 1981) is used here to evaluate if the asparagines are located in a hydrophilic surrounding which is more likely on the protein surface.

Author(s)

original R code by Thomas Shafee, modified by Milan Dragićević

Source

https://prosite.expasy.org/PDOC00001

References

Hopp TP. Woods KR. (1981) Prediction of protein antigenic determinants from amino acid sequences. Proceedings of the National Academy of Sciences of the United States of America, 78(6): 3824-8

Examples

1
2
3
4
5
6
library(ragp)
data(at_nsp)

nglc_pred <- scan_nglc(data = at_nsp,
                       sequence = sequence,
                       id = Transcript.id)

missuse/ragp documentation built on Jan. 4, 2022, 10:49 a.m.