predict.plastogram_model: Predict subchloroplast localization
In BioGenies/PlastoGram: Prediction of Subchloroplast localization

View source: R/predict.R

predict.plastogram_model

R Documentation

Predict subchloroplast localization

Description

Predicts protein subchloroplast localization using the PlastoGram algorithm.

Usage

## S3 method for class 'plastogram_model'
predict(object, newdata, hmmer_dir = Sys.which("hmmsearch"), ...)

Arguments

`object`	`plastogram_model` object.
`newdata`	`list` of sequences (for example as given by `read_fasta` or `read_txt`).
`hmmer_dir`	path to the hmmer directory in which `hmmsearch` executable is located
`...`	further arguments passed to or from other methods.

Details

PlastoGram depends on the HMMER software for prediction of signals responsible for targeting to the thylakoid lumen via Sec and Tat pathways.

PlastoGram lower-order models are responsible for identification of features characteristic for specific subchloroplast localizations. They include random forest models based on ngrams (short amino acid motifs):

Nuclear_model: recognizes nuclear-encoded proteins
Membrane_model: identifies membrane proteins
N_E_vs_N_TM_model: differentiates between nuclear-encoded envelope proteins and nuclear-encoded thylakoid membrane proteins. Prediction values over 0.5 indicate envelope, whereas lower thylakoid membrane
Plastid_membrane_model: distinguishes plastid-encoded proteins targeted to plastid inner and thylakoid membrane. Prediction values higher than 0.5 indicate inner membrane, whereas lower thylakoid membrane
N_E_vs_N_S_model: differentiates nuclear-encoded proteins targeted to envelope from nuclear-encoded stromal proteins. Prediction values over 0.5 indicate envelope, whereas lower stroma
Nuclear_membrane_model: distinguishes nuclear-encoded membrane proteins from all others

and profile HMM models based on HMMER software

Sec_model: recognizes proteins targeted to the thylakoid lumen via Sec pathway
Tat_model: recognizes proteins targeted to the thylakoid lumen via Tat pathway

Value

object of class plastogram_prediction, a list of three data frames containing prediction results:

Lower_level_preds: Prediction results from eight lower-level models trained to recognize sequence features associated with specific subplastid localization. Data frame with 9 columns and number of rows equal to the number of analyzed sequences. The first column contains sequence name and the following columns store prediction results from all lower-level models. For more information on lower-order models see Details section.
Higher_level_preds: Prediction results from higher-level model trained to determine final subplastid localization of a given protein based on predictions obtained by lower-level models. Data frame with 10 columns and number of rows equal to number of analyzed sequences. The first column (seq_name) indicates sequence name an the following eight columns contain prediction probabilities for each of the locations considered by the PlastoGram model. The last column (Localization) contains abbreviation of a predicted location. For more information on higher-level model see Details section.
OM_IM_preds
Final_results: Summary of PlastoGram predictions. Data frame with 3 columns and number of rows equal to the number of analyzed sequences. The columns contain the following information: name of the analyzed sequence, predicted localization, probability of the predicted localization (assumes values from 0 to 1).