predict.plastogram_model: Predict subchloroplast localization

View source: R/predict.R

predict.plastogram_modelR Documentation

Predict subchloroplast localization

Description

Predicts protein subchloroplast localization using the PlastoGram algorithm.

Usage

## S3 method for class 'plastogram_model'
predict(object, newdata, hmmer_dir = Sys.which("hmmsearch"), ...)

Arguments

object

plastogram_model object.

newdata

list of sequences (for example as given by read_fasta or read_txt).

hmmer_dir

path to the hmmer directory in which hmmsearch executable is located

...

further arguments passed to or from other methods.

Details

PlastoGram depends on the HMMER software for prediction of signals responsible for targeting to the thylakoid lumen via Sec and Tat pathways.

PlastoGram lower-order models are responsible for identification of features characteristic for specific subchloroplast localizations. They include random forest models based on ngrams (short amino acid motifs):

Nuclear_model

recognizes nuclear-encoded proteins

Membrane_model

identifies membrane proteins

N_E_vs_N_TM_model

differentiates between nuclear-encoded envelope proteins and nuclear-encoded thylakoid membrane proteins. Prediction values over 0.5 indicate envelope, whereas lower thylakoid membrane

Plastid_membrane_model

distinguishes plastid-encoded proteins targeted to plastid inner and thylakoid membrane. Prediction values higher than 0.5 indicate inner membrane, whereas lower thylakoid membrane

N_E_vs_N_S_model

differentiates nuclear-encoded proteins targeted to envelope from nuclear-encoded stromal proteins. Prediction values over 0.5 indicate envelope, whereas lower stroma

Nuclear_membrane_model

distinguishes nuclear-encoded membrane proteins from all others

and profile HMM models based on HMMER software

Sec_model

recognizes proteins targeted to the thylakoid lumen via Sec pathway

Tat_model

recognizes proteins targeted to the thylakoid lumen via Tat pathway

Value

object of class plastogram_prediction, a list of three data frames containing prediction results:

Lower_level_preds

Prediction results from eight lower-level models trained to recognize sequence features associated with specific subplastid localization. Data frame with 9 columns and number of rows equal to the number of analyzed sequences. The first column contains sequence name and the following columns store prediction results from all lower-level models. For more information on lower-order models see Details section.

Higher_level_preds

Prediction results from higher-level model trained to determine final subplastid localization of a given protein based on predictions obtained by lower-level models. Data frame with 10 columns and number of rows equal to number of analyzed sequences. The first column (seq_name) indicates sequence name an the following eight columns contain prediction probabilities for each of the locations considered by the PlastoGram model. The last column (Localization) contains abbreviation of a predicted location. For more information on higher-level model see Details section.

OM_IM_preds
Final_results

Summary of PlastoGram predictions. Data frame with 3 columns and number of rows equal to the number of analyzed sequences. The columns contain the following information: name of the analyzed sequence, predicted localization, probability of the predicted localization (assumes values from 0 to 1).


BioGenies/PlastoGram documentation built on May 25, 2023, 10:45 p.m.