run_lncPro | R Documentation |
This function can predict lncRNA/RNA-protein interactions using lncPro method. Model retraining and feature extraction are also supported. Programs "RNAsubopt" from software "ViennaRNA Package" and "Predator" is required. Please also note that "Predator" is only available on UNIX/Linux and 32-bit Windows OS.
run_lncPro(
seqRNA,
seqPro,
mode = c("prediction", "retrain", "feature"),
args.RNAsubopt = NULL,
args.Predator = NULL,
path.RNAsubopt = "RNAsubopt",
path.Predator = "Predator/predator",
path.stride = "Predator/stride.dat",
workDir.Pro = getwd(),
prediction = c("original", "retrained"),
retrained.model = NULL,
label = NULL,
positive.class = NULL,
folds.num = 10,
ntree = 3000,
mtry.ratios = c(0.1, 0.2, 0.4, 0.6, 0.8),
seed = 1,
parallel.cores = 2,
cl = NULL,
...
)
seqRNA |
RNA sequences loaded by function |
seqPro |
protein sequences loaded by function |
mode |
a string. Set |
args.RNAsubopt, args.Predator |
string (in format such as "-N –pfScale 1.07") specifying additional arguments for "RNAsubopt" (except "-p") and "Predator" (except "-a" and "-b"). This is used when you want to control their behaviours. Arguments for "RNAsubopt" and "Predator" please refer to their manual. Default: |
path.RNAsubopt, path.Predator |
a string specifying the location of "RNAsubopt" and "Predator" program. |
path.stride |
a string specifying the location of file "stride.dat" required by program Predator. |
workDir.Pro |
a string specifying the directory for temporary files used for process protein sequences. The temp files will be deleted automatically when the calculation is completed. If the directory does not exist, it will be created automatically. |
prediction |
(only when |
retrained.model |
(only when |
label |
a string or a vector of strings or |
positive.class |
(only when |
folds.num |
(only when |
ntree |
integer, number of trees to grow. See |
mtry.ratios |
(only when |
seed |
(only when |
parallel.cores |
an integer that indicates the number of cores for parallel computation.
Default: |
cl |
parallel cores to be passed to this function. |
... |
(only when |
The method is proposed by lncPro. This function, runlncPro
, has
improved and fixed the original code.
runlncPro
depends on the program "RNAsubopt" of software "ViennaRNA"
(http://www.tbi.univie.ac.at/RNA/index.html) and "Predator"
(https://bioweb.pasteur.fr/packages/pack@predator@2.1.2).
Parameter path.RNAsubopt
can be simply defined as "RNAsubopt"
as
default when the OS is UNIX/Linux. However, for some OS, such as Windows, users may
need to specify the path.RNAsubopt
if the path of "RNAsubopt" haven't been
added in environment variables (e.g. path.RNAsubopt = '"C:/Program Files/ViennaRNA/RNAsubopt.exe"'
).
Program "Predator" is only available on UNIX/Linux and 32-bit Windows OS.
If mode = "prediction"
, this function returns a data frame that contains the predicted results.
If mode = "retrain"
, this function returns a random forest classifier.
If mode = "feature"
, this function returns a data frame that contains the extracted features.
Lu Q, Ren S, Lu M, et al. Computational prediction of associations between long non-coding RNAs and proteins. BMC Genomics 2013; 14:651
# Following codes only show how to use this function
# and cannot reflect the genuine performance of tools or classifiers.
data(demoPositiveSeq)
seqRNA <- demoPositiveSeq$RNA.positive
seqPro <- demoPositiveSeq$Pro.positive
# Predicting ncRNA-protein pairs (you need to use your own paths):
path.RNAsubopt <- "RNAsubopt"
path.Predator <- "/mnt/external_drive_1/hansy/predator/predator"
path.stride <- "/mnt/external_drive_1/hansy/predator/stride.dat"
workDir.Pro <- "tmp"
Res_lncPro_1 <- run_lncPro(seqRNA = seqRNA, seqPro = seqPro, mode = "prediction",
path.RNAsubopt = path.RNAsubopt, path.Predator = path.Predator,
path.stride = path.stride, workDir.Pro = workDir.Pro,
prediction = "original", label = "lncPro_original",
parallel.cores = 10) # using original algorithm
Res_lncPro_2 <- run_lncPro(seqRNA = seqRNA, seqPro = seqPro, mode = "prediction",
path.RNAsubopt = path.RNAsubopt, path.Predator = path.Predator,
path.stride = path.stride, workDir.Pro = workDir.Pro,
prediction = "retrained", retrained.model = NULL,
label = "lncPro_retrained",
parallel.cores = 10) # using default rebuilt model
# Train a new model:
# Argument "label" which indicates the class of each input pair is required here.
# "label" should correspond to the classes of "seqRNA" and "seqPro".
# "positive.class" should be one of the classes in argument "label" or can be set as "NULL".
# In the latter case, the first label in "label" will be used as the positive class.
# Parameters of random forest, such as "replace", "nodesize", can be passed using "..." argument.
lncPro_model = run_lncPro(seqRNA = seqRNA, seqPro = seqPro, mode = "retrain",
path.RNAsubopt = path.RNAsubopt, path.Predator = path.Predator,
path.stride = path.stride, workDir.Pro = workDir.Pro,
label = rep(c("Interact", "Non.Interact"), each = 10),
positive.class = NULL, folds.num = 10,
ntree = 100, seed = 1, parallel.cores = 2, replace = FALSE)
# Predicting using new built model by setting "retrained.model = lncPro_model":
Res_lncPro_3 <- run_lncPro(seqRNA = seqRNA, seqPro = seqPro, mode = "prediction",
path.RNAsubopt = path.RNAsubopt, path.Predator = path.Predator,
path.stride = path.stride, workDir.Pro = workDir.Pro,
prediction = "retrained", retrained.model = lncPro_model,
label = rep(c("Interact", "Non.Interact"), each = 10),
parallel.cores = 10)
# Only extracting features:
lncPro_feature_df <- run_lncPro(seqRNA = seqRNA, seqPro = seqPro,
mode = "feature", path.RNAsubopt = path.RNAsubopt,
path.Predator = path.Predator, path.stride = path.stride,
workDir.Pro = workDir.Pro, label = "Interact",
parallel.cores = 10)
# Extracted features can be used to build classifiers using other machine learning
# algorithms, which provides users with more flexibility.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.