predict.plastogram_model | R Documentation |
Predicts protein subchloroplast localization using the PlastoGram algorithm.
## S3 method for class 'plastogram_model'
predict(object, newdata, hmmer_dir = Sys.which("hmmsearch"), ...)
object |
|
newdata |
|
hmmer_dir |
path to the hmmer directory in which |
... |
further arguments passed to or from other methods. |
PlastoGram depends on the HMMER software for prediction of signals responsible for targeting to the thylakoid lumen via Sec and Tat pathways.
PlastoGram lower-order models are responsible for identification of features characteristic for specific subchloroplast localizations. They include random forest models based on ngrams (short amino acid motifs):
recognizes nuclear-encoded proteins
identifies membrane proteins
differentiates between nuclear-encoded envelope proteins and nuclear-encoded thylakoid membrane proteins. Prediction values over 0.5 indicate envelope, whereas lower thylakoid membrane
distinguishes plastid-encoded proteins targeted to plastid inner and thylakoid membrane. Prediction values higher than 0.5 indicate inner membrane, whereas lower thylakoid membrane
differentiates nuclear-encoded proteins targeted to envelope from nuclear-encoded stromal proteins. Prediction values over 0.5 indicate envelope, whereas lower stroma
distinguishes nuclear-encoded membrane proteins from all others
and profile HMM models based on HMMER software
recognizes proteins targeted to the thylakoid lumen via Sec pathway
recognizes proteins targeted to the thylakoid lumen via Tat pathway
object of class plastogram_prediction
, a list
of three
data frame
s containing prediction results:
Prediction results from eight lower-level models trained to recognize sequence features associated with specific subplastid localization. Data frame with 9 columns and number of rows equal to the number of analyzed sequences. The first column contains sequence name and the following columns store prediction results from all lower-level models. For more information on lower-order models see Details section.
Prediction results from higher-level model
trained to determine final subplastid localization of a given protein
based on predictions obtained by lower-level models. Data frame with 10 columns
and number of rows equal to number of analyzed sequences. The first column
(seq_name
) indicates sequence name an the following eight columns
contain prediction probabilities for each of the locations considered by
the PlastoGram model. The last column (Localization
) contains
abbreviation of a predicted location. For more information on higher-level
model see Details section.
Summary of PlastoGram predictions. Data frame with 3 columns and number of rows equal to the number of analyzed sequences. The columns contain the following information: name of the analyzed sequence, predicted localization, probability of the predicted localization (assumes values from 0 to 1).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.