predict_on_targeted_proteome: Predict on targeted proteome.

Description Usage Arguments Details Examples

Description

Predict on targeted proteome.

Usage

1
2
3
4
5
6
predict_on_targeted_proteome(ptm_site, flanking_size = 12, SPIDER = T,
  positive_info_file, known_protein_fasta_file, predict_protein_fasta_file,
  output_label_training, output_label_predict, lower_bound = -1,
  upper_bound = 1, liblinear_dir, feature_file_path, cvlog_path_name,
  specificity_level, n_fold = 2,
  flag_for_score_threshold_chosen = "reference", score_threshold)

Arguments

ptm_site

The target amino acid of the given PTM type, in upper-case single letter representation.

flanking_size

The number of residues surrounding each side of the center residue, The total window size will be 2*flanking_size+1 (default to 12).

SPIDER

A boolean variable indicating whether to use SPIDER3 features (default set to TRUE.)

positive_info_file

A text file containing the positive PTM sites in the required format.

known_protein_fasta_file

A text file containing the proteins sequences of interest and known PTM sites in fasta format.

predict_protein_fasta_file

A text file containing the proteins sequences with PTM sites to be predicted in fasta format.

output_label_training

The string to tag the output files associated with training proteins.

output_label_predict

The string to tag the output files associated with prediction proteins.

lower_bound

The lower bound of the scaled data range (default to -1).

upper_bound

The upper bound of the scaled data range (default to 1).

liblinear_dir

The path for the Liblinear tool.

feature_file_path

The path for the feature files.

cvlog_path_name

The path and name of the log files, which hold the details of Liblinear procedures.

specificity_level

A number ranges from 0 to 1 indicating the specificity user requires the classifier to achieve (default to 0.99).

n_fold

The number of folds used for training and prediction in cross validation stage (default set to 2).

flag_for_score_threshold_chosen

A string indicating whether use reference score threshold or get from the user supplied training data (default set to "reference").

score_threshold

A numerical value between 0 to 1 indicating the reference score threshold (required in "reference" mode).

Details

This function outputs the features generated from input files.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
predict_on_targeted_proteome = function (ptm_site = "S", 
                                         flanking_size=12, 
                                         SPIDER = T,
                                         positive_info_file = "known_ps.tsv", 
                                         known_protein_fasta_file = "known_fasta.tsv",
                                         predict_protein_fasta_file = "predict_fasta.tsv",
                                         output_label_training = "ps_training",
                                         output_label_predict = "ps_predict",
                                         lower_bound = -1,
                                         upper_bound = 1,
                                         liblinear_dir = "/data/ginny/liblinear-2.11/",
                                         feature_file_path = "/data/ginny/test_package/",
                                         cvlog_path_name = "/data/ginny/test_package/cvlog.txt",
                                         specificity_level = 0.99,
                                         n_fold = 2,
                                         flag_for_score_threshold_chosen = "cv",
                                         score_threshold = NULL)

ginnyintifa/PTMscape documentation built on Nov. 9, 2021, 10:39 p.m.