get_espritz: Query Espritz web server.
In missuse/ragp: Mining for Hydroxyproline rich glycoprotein sequences

Description Usage Arguments Details Value Note Source References Examples

Espritz web server predicts disordered regions from primary sequence. It utilizes Bi-directional Recursive Neural Networks and can process proteins on a genomic scale with little effort and state-of-the-art accuracy.

get_espritz(data, ...)

## S3 method for class 'character'
get_espritz(data, ...)

## S3 method for class 'data.frame'
get_espritz(data, sequence, id, ...)

## S3 method for class 'list'
get_espritz(data, ...)

## Default S3 method:
get_espritz(
  data = NULL,
  sequence,
  id,
  model = c("X-Ray", "Disprot", "NMR"),
  FPR = c("best Sw", "5% FPR"),
  simplify = TRUE,
  progress = FALSE,
  ...
)

## S3 method for class 'AAStringSet'
get_espritz(data, ...)

`data`	A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class `SeqFastaAA` resulting from `read.fasta` call. Alternatively an `AAStringSet` object. Should be left blank if vectors are provided to sequence and id arguments.
`...`	currently no additional arguments are accepted apart the ones documented bellow.
`sequence`	A vector of strings representing protein amino acid sequences, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
`id`	A vector of strings representing protein identifiers, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
`model`	One of c('X-Ray', 'Disprot', 'NMR'), default is 'X-Ray'. Determines the model to be used for prediction. See details.
`FPR`	One of c('best Sw', '5"%" FPR'). default is 'best Sw'. Determines the cutoff probability for prediction. 'best Sw' maximizes a weighted score rewarding correctly disorder prediction more than order prediction.
`simplify`	A Boolean indicating the type of returned object, defaults to TRUE.
`progress`	Boolean, whether to show the progress bar, at default set to FALSE.

Three models trained on different data sets are available and can be selected via the argument model: X-Ray - based on missing atoms from the Protein Data Bank (PDB) X-ray solved structures. If this option is chosen then the predictors with short disorder options are executed. Disprot - contains longer disorder segments compared to x-ray. In particular, disprot a manually curetted database which is often based on functional attributes of the disordered region was used for this definition. Disorder residues are defined if the disprot curators consider the residue to be disordered at least once. All other residues are considered ordered. If this option is chosen then the predictors with long disorder options are executed. 'NMR' - based on NMR mobility. NMR flexibility is calculated using the Mobi server optimized to replicate the ordered-disordered NMR definition used in CASP8. These models provide quite different predictions. For further details visit http://old.protein.bio.unipd.it/espritz/help_pages/help.html and http://old.protein.bio.unipd.it/espritz/help_pages/methods.html.

If simplify == TRUE: A data frame (one row per disordered region) with columns:

start: Integer, indicating the sequence position of disordered region start.
end: Integer, indicating the sequence position of disordered region end.
id: Character, indicating the protein identifier.

If simplify == FALSE: A data frame (one row per protein) with columns:

id: Character, indicating the protein identifier.
probability: List column of numeric vectors, vectors contain probabilities of disorder for each residue.
prediction: Character, indicating the prediction: D - disordered, O - ordered for each residue.

The Espritz web server has a limit on the amount of daily queries by ip. The function will inform the user when the limit has been exceeded.

http://old.protein.bio.unipd.it/espritz/

Walsh I, Martin AJM, Di domenico T, Tosatto SCE (2012) ESpritz: accurate and fast prediction of protein disorder. Bioinformatics 28(4): 503 - 509

library(ragp)

espritz_test <- get_espritz(at_nsp[1:10,],
                            sequence,
                            Transcript.id)
espritz_test