run_lncPro: Predict RNA-Protein Interaction Using lncPro Method

View source: R/Methods.R

run_lncProR Documentation

Predict RNA-Protein Interaction Using lncPro Method

Description

This function can predict lncRNA/RNA-protein interactions using lncPro method. Model retraining and feature extraction are also supported. Programs "RNAsubopt" from software "ViennaRNA Package" and "Predator" is required. Please also note that "Predator" is only available on UNIX/Linux and 32-bit Windows OS.

Usage

run_lncPro(
  seqRNA,
  seqPro,
  mode = c("prediction", "retrain", "feature"),
  args.RNAsubopt = NULL,
  args.Predator = NULL,
  path.RNAsubopt = "RNAsubopt",
  path.Predator = "Predator/predator",
  path.stride = "Predator/stride.dat",
  workDir.Pro = getwd(),
  prediction = c("original", "retrained"),
  retrained.model = NULL,
  label = NULL,
  positive.class = NULL,
  folds.num = 10,
  ntree = 3000,
  mtry.ratios = c(0.1, 0.2, 0.4, 0.6, 0.8),
  seed = 1,
  parallel.cores = 2,
  cl = NULL,
  ...
)

Arguments

seqRNA

RNA sequences loaded by function read.fasta from seqinr-package. Or a list of RNA/protein sequences. RNA sequences will be converted into lower case letters.

seqPro

protein sequences loaded by function read.fasta from seqinr-package. Or a list of protein sequences. Protein sequences will be converted into upper case letters. Each sequence should be a vector of single characters.

mode

a string. Set "prediction" to predict ncRNA-protein pairs and return prediction results; set "retrain" to build a new random forest model using the input data; set "feature" to return a data frame contains the extracted features. Users can use the extracted features generated by mode = "feature" to train classifiers with other machine learning algorithms. Default: "prediction".

args.RNAsubopt, args.Predator

string (in format such as "-N –pfScale 1.07") specifying additional arguments for "RNAsubopt" (except "-p") and "Predator" (except "-a" and "-b"). This is used when you want to control their behaviours. Arguments for "RNAsubopt" and "Predator" please refer to their manual. Default: NULL.

path.RNAsubopt, path.Predator

a string specifying the location of "RNAsubopt" and "Predator" program.

path.stride

a string specifying the location of file "stride.dat" required by program Predator.

workDir.Pro

a string specifying the directory for temporary files used for process protein sequences. The temp files will be deleted automatically when the calculation is completed. If the directory does not exist, it will be created automatically.

prediction

(only when mode = "prediction") set "original" to use original lncPro algorithm, or set "retrained" to call retrained model. The retrained model is constructed with the same features as the original version, but random classifier is employed to build the classifier.

retrained.model

(only when mode = "prediction" and prediction = "retrained") use the default model or a new retrained model to predict ncRNA-protein pairs? If NULL, default machine learning model will be used. Or pass the model generated by this function with parameter "mode = retrain". Default: NULL. See examples below.

label

a string or a vector of strings or NULL. Optional when mode = "prediction" or mode = "feature": used to give labels or notes to the output result. Required when mode = "retrain": must be a vector of strings that corresponds to input sequences. Each string indicates the class of each input pair. Default: NULL.

positive.class

(only when mode = "retrain") NULL or a string used to indicate which class is the positive class, Should be one of the classes in label or leave positive.class = NULL. In the latter case, the first class in label will be used as the positive class. Default: NULL.

folds.num

(only when mode = "retrain") an integer indicates the number of folds for cross validation. Default: 10 for 10-fold cross validation.

ntree

integer, number of trees to grow. See randomForest. Default: 3000.

mtry.ratios

(only when mode = "retrain") used to indicate the ratios of mtry when tuning the random forest classifier. mtry = ratio of mtry * number of features Default: c(0.1, 0.2, 0.4, 0.6, 0.8).

seed

(only when mode = "retrain") an integer indicates the random seed for data splitting.

parallel.cores

an integer that indicates the number of cores for parallel computation. Default: 2. Set parallel.cores = -1 to run with all the cores. parallel.cores should be == -1 or >= 1.

cl

parallel cores to be passed to this function.

...

(only when mode = "retrain") other parameters (except ntree and mtry) passed to randomForest function.

Details

The method is proposed by lncPro. This function, runlncPro, has improved and fixed the original code.

runlncPro depends on the program "RNAsubopt" of software "ViennaRNA" (http://www.tbi.univie.ac.at/RNA/index.html) and "Predator" (https://bioweb.pasteur.fr/packages/pack@predator@2.1.2).

Parameter path.RNAsubopt can be simply defined as "RNAsubopt" as default when the OS is UNIX/Linux. However, for some OS, such as Windows, users may need to specify the path.RNAsubopt if the path of "RNAsubopt" haven't been added in environment variables (e.g. path.RNAsubopt = '"C:/Program Files/ViennaRNA/RNAsubopt.exe"').

Program "Predator" is only available on UNIX/Linux and 32-bit Windows OS.

Value

If mode = "prediction", this function returns a data frame that contains the predicted results.

If mode = "retrain", this function returns a random forest classifier.

If mode = "feature", this function returns a data frame that contains the extracted features.

References

Lu Q, Ren S, Lu M, et al. Computational prediction of associations between long non-coding RNAs and proteins. BMC Genomics 2013; 14:651

Examples




# Following codes only show how to use this function
# and cannot reflect the genuine performance of tools or classifiers.

data(demoPositiveSeq)
seqRNA <- demoPositiveSeq$RNA.positive
seqPro <- demoPositiveSeq$Pro.positive

# Predicting ncRNA-protein pairs (you need to use your own paths):

path.RNAsubopt <- "RNAsubopt"
path.Predator <- "/mnt/external_drive_1/hansy/predator/predator"
path.stride <- "/mnt/external_drive_1/hansy/predator/stride.dat"
workDir.Pro <- "tmp"

Res_lncPro_1 <- run_lncPro(seqRNA = seqRNA, seqPro = seqPro, mode = "prediction",
                           path.RNAsubopt = path.RNAsubopt, path.Predator = path.Predator,
                           path.stride = path.stride, workDir.Pro = workDir.Pro,
                           prediction = "original", label = "lncPro_original",
                           parallel.cores = 10) # using original algorithm

Res_lncPro_2 <- run_lncPro(seqRNA = seqRNA, seqPro = seqPro, mode = "prediction",
                           path.RNAsubopt = path.RNAsubopt, path.Predator = path.Predator,
                           path.stride = path.stride, workDir.Pro = workDir.Pro,
                           prediction = "retrained", retrained.model = NULL,
                           label = "lncPro_retrained",
                           parallel.cores = 10) # using default rebuilt model

# Train a new model:

# Argument "label" which indicates the class of each input pair is required here.
# "label" should correspond to the classes of "seqRNA" and "seqPro".
# "positive.class" should be one of the classes in argument "label" or can be set as "NULL".
# In the latter case, the first label in "label" will be used as the positive class.
# Parameters of random forest, such as "replace", "nodesize", can be passed using "..." argument.

lncPro_model = run_lncPro(seqRNA = seqRNA, seqPro = seqPro, mode = "retrain",
                          path.RNAsubopt = path.RNAsubopt, path.Predator = path.Predator,
                          path.stride = path.stride, workDir.Pro = workDir.Pro,
                          label = rep(c("Interact", "Non.Interact"), each = 10),
                          positive.class = NULL, folds.num = 10,
                          ntree = 100, seed = 1, parallel.cores = 2, replace = FALSE)

# Predicting using new built model by setting "retrained.model = lncPro_model":

Res_lncPro_3 <- run_lncPro(seqRNA = seqRNA, seqPro = seqPro, mode = "prediction",
                           path.RNAsubopt = path.RNAsubopt, path.Predator = path.Predator,
                           path.stride = path.stride, workDir.Pro = workDir.Pro,
                           prediction = "retrained", retrained.model = lncPro_model,
                           label = rep(c("Interact", "Non.Interact"), each = 10),
                           parallel.cores = 10)

# Only extracting features:

lncPro_feature_df <- run_lncPro(seqRNA = seqRNA, seqPro = seqPro,
                                mode = "feature", path.RNAsubopt = path.RNAsubopt,
                                path.Predator = path.Predator, path.stride = path.stride,
                                workDir.Pro = workDir.Pro, label = "Interact",
                                parallel.cores = 10)

# Extracted features can be used to build classifiers using other machine learning
# algorithms, which provides users with more flexibility.




HAN-Siyu/ncProR documentation built on Nov. 3, 2023, 12:08 a.m.