get_espritz: Query Espritz web server.

Description Usage Arguments Details Value Note Source References Examples

Description

Espritz web server predicts disordered regions from primary sequence. It utilizes Bi-directional Recursive Neural Networks and can process proteins on a genomic scale with little effort and state-of-the-art accuracy.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
get_espritz(data, ...)

## S3 method for class 'character'
get_espritz(data, ...)

## S3 method for class 'data.frame'
get_espritz(data, sequence, id, ...)

## S3 method for class 'list'
get_espritz(data, ...)

## Default S3 method:
get_espritz(
  data = NULL,
  sequence,
  id,
  model = c("X-Ray", "Disprot", "NMR"),
  FPR = c("best Sw", "5% FPR"),
  simplify = TRUE,
  progress = FALSE,
  ...
)

## S3 method for class 'AAStringSet'
get_espritz(data, ...)

Arguments

data

A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class SeqFastaAA resulting from read.fasta call. Alternatively an AAStringSet object. Should be left blank if vectors are provided to sequence and id arguments.

...

currently no additional arguments are accepted apart the ones documented bellow.

sequence

A vector of strings representing protein amino acid sequences, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.

id

A vector of strings representing protein identifiers, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.

model

One of c('X-Ray', 'Disprot', 'NMR'), default is 'X-Ray'. Determines the model to be used for prediction. See details.

FPR

One of c('best Sw', '5"%" FPR'). default is 'best Sw'. Determines the cutoff probability for prediction. 'best Sw' maximizes a weighted score rewarding correctly disorder prediction more than order prediction.

simplify

A Boolean indicating the type of returned object, defaults to TRUE.

progress

Boolean, whether to show the progress bar, at default set to FALSE.

Details

Three models trained on different data sets are available and can be selected via the argument model: X-Ray - based on missing atoms from the Protein Data Bank (PDB) X-ray solved structures. If this option is chosen then the predictors with short disorder options are executed. Disprot - contains longer disorder segments compared to x-ray. In particular, disprot a manually curetted database which is often based on functional attributes of the disordered region was used for this definition. Disorder residues are defined if the disprot curators consider the residue to be disordered at least once. All other residues are considered ordered. If this option is chosen then the predictors with long disorder options are executed. 'NMR' - based on NMR mobility. NMR flexibility is calculated using the Mobi server optimized to replicate the ordered-disordered NMR definition used in CASP8. These models provide quite different predictions. For further details visit http://old.protein.bio.unipd.it/espritz/help_pages/help.html and http://old.protein.bio.unipd.it/espritz/help_pages/methods.html.

Value

If simplify == TRUE: A data frame (one row per disordered region) with columns:

start

Integer, indicating the sequence position of disordered region start.

end

Integer, indicating the sequence position of disordered region end.

id

Character, indicating the protein identifier.

If simplify == FALSE: A data frame (one row per protein) with columns:

id

Character, indicating the protein identifier.

probability

List column of numeric vectors, vectors contain probabilities of disorder for each residue.

prediction

Character, indicating the prediction: D - disordered, O - ordered for each residue.

Note

The Espritz web server has a limit on the amount of daily queries by ip. The function will inform the user when the limit has been exceeded.

Source

http://old.protein.bio.unipd.it/espritz/

References

Walsh I, Martin AJM, Di domenico T, Tosatto SCE (2012) ESpritz: accurate and fast prediction of protein disorder. Bioinformatics 28(4): 503 - 509

Examples

1
2
3
4
5
6
library(ragp)

espritz_test <- get_espritz(at_nsp[1:10,],
                            sequence,
                            Transcript.id)
espritz_test

missuse/ragp documentation built on Jan. 4, 2022, 10:49 a.m.