get_signalp: Query SignalP web server.

Description Usage Arguments Value Note Source References See Also Examples

Description

SignalP 4.1 server predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
get_signalp(data, ...)

## S3 method for class 'character'
get_signalp(
  data,
  org_type = c("euk", "gram-", "gram+"),
  Dcut_type = c("default", "sensitive", "user"),
  Dcut_noTM = 0.45,
  Dcut_TM = 0.5,
  method = c("best", "notm"),
  minlen = NULL,
  trunc = 70L,
  splitter = 1000L,
  attempts = 2,
  progress = FALSE,
  ...
)

## S3 method for class 'data.frame'
get_signalp(data, sequence, id, ...)

## S3 method for class 'list'
get_signalp(data, ...)

## Default S3 method:
get_signalp(data = NULL, sequence, id, ...)

## S3 method for class 'AAStringSet'
get_signalp(data, ...)

Arguments

data

A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class SeqFastaAA resulting from read.fasta call. Alternatively an AAStringSet object. Should be left blank if vectors are provided to sequence and id arguments.

...

currently no additional arguments are accepted apart the ones documented bellow.

org_type

One of c("euk", "gram-", "gram+"), defaults to "euk". Which model should be used for prediction.

Dcut_type

One of c("default", "sensitive", "user"), defaults to "default". The default cutoff values for SignalP 4 are chosen to optimize the performance measured as Matthews Correlation Coefficient (MCC). This results in a lower sensitivity (true positive rate) than SignalP 3.0 had. Setting this argument to "sensitive" will yield the same sensitivity as SignalP 3.0. This will make the false positive rate slightly higher, but still better than that of SignalP 3.0.

Dcut_noTM

A numeric value, with range 0 - 1, defaults to 0.45. For experimenting with cutoff values.

Dcut_TM

A numeric value, with range 0 - 1, defaults to 0.5. For experimenting with cutoff values.

method

One of c("best", "notm"), defaults to "best". Signalp 4.1 contains two types of neural networks. SignalP-TM has been trained with sequences containing transmembrane segments in the data set, while SignalP-noTM has been trained without those sequences. Per default, SignalP 4.1 uses SignalP-TM as a preprocessor to determine whether to use SignalP-TM or SignalP-noTM in the final prediction (if 4 or more positions are predicted to be in a transmembrane state, SignalP-TM is used, otherwise SignalP-noTM). An exception is Gram-positive bacteria, where SignalP-TM is used always. If you are confident that there are no transmembrane segments in your data, you can get a slightly better performance by choosing "Input sequences do not include TM regions", which will tell SignalP 4.1 to use SignalP-noTM always.

minlen

An integer value corresponding to the minimal predicted signal peptide length, at default set to 10. SignalP 4.0 could, in rare cases, erroneously predict signal peptides shorter than 10 residues. These errors have in SignalP 4.1 been eliminated by imposing a lower limit on the cleavage site position (signal peptide length). The minimum length is by default 10, but you can adjust it. Signal peptides shorter than 15 residues are very rare. If you want to disable this length restriction completely, enter 0 (zero).

trunc

An integer value corresponding to the N-terminal truncation of input sequence, at default set to 70. By default, the predictor truncates each sequence to max. 70 residues before submitting it to the neural networks. If you want to predict extremely long signal peptides, you can try a higher value, or disable truncation completely by entering 0 (zero).

splitter

An integer indicating the number of sequences to be in each .fasta file that is to be sent to the server. Default is 1000. Change only in case of a server side error. Accepted values are in range of 1 to 2000.

attempts

Integer, number of attempts if server unresponsive, at default set to 2.

progress

Boolean, whether to show the progress bar, at default set to FALSE.

sequence

A vector of strings representing protein amino acid sequences, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.

id

A vector of strings representing protein identifiers, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.

Value

A data frame with columns:

id

Character, as from input

Cmax

Numeric, C-score (raw cleavage site score). The output from the CS networks, which are trained to distinguish signal peptide cleavage sites from everything else. Note the position numbering of the cleavage site: the C-score is trained to be high at the position immediately after the cleavage site (the first residue in the mature protein).

Cmax.pos

Integer, position of Cmax. position immediately after the cleavage site (the first residue in the mature protein).

Ymax

Numeric, Y-score (combined cleavage site score), A combination (geometric average) of the C-score and the slope of the S-score, resulting in a better cleavage site prediction than the raw C-score alone. This is due to the fact that multiple high-peaking C-scores can be found in one sequence, where only one is the true cleavage site. The Y-score distinguishes between C-score peaks by choosing the one where the slope of the S-score is steep.

Ymax.pos

Integer, position of Ymax

Smax

Numeric, S-score (signal peptide score). The output from the SP networks, which are trained to distinguish positions within signal peptides from positions in the mature part of the proteins and from proteins without signal peptides.

Smax.pos

Integer, position of Smax

Smean

Numeric, The average S-score of the possible signal peptide (from position 1 to the position immediately before the maximal Y-score)

Dmean

Numeric, D-score (discrimination score). A weighted average of the mean S and the max. Y scores. This is the score that is used to discriminate signal peptides from non-signal peptides.

is.sp

Character, does the sequence contain a N-sp

Dmaxcut

Numeric, as from input, Dcut_noTM if SignalP-noTM network used and Dcut_TM if SignalP-TM network used

Networks.used

Character, which network was used for the prediction: SignalP-noTM or SignalP-TM

is.signalp

Logical, did SignalP predict the presence of a signal peptide

sp.length

Integer, length of the predicted signal peptide.

Note

This function creates temporary files in the working directory.

Source

https://services.healthtech.dtu.dk/service.php?SignalP-4.1

References

Petersen TN. Brunak S. Heijne G. Nielsen H. (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature Methods 8: 785-786

See Also

get_signalp5 get_phobius get_targetp

Examples

1
2
3
4
5
library(ragp)
signalp_pred <- get_signalp(data = at_nsp[1:10,],
                            sequence,
                            Transcript.id)
signalp_pred

missuse/ragp documentation built on Jan. 4, 2022, 10:49 a.m.