run_LION: Predict RNA-Protein Interaction Using LION Method

View source: R/Methods.R

run_LIONR Documentation

Predict RNA-Protein Interaction Using LION Method

Description

This function can predict lncRNA/RNA-protein interactions using LION method. Model retraining and feature extraction are also supported.

Usage

run_LION(
  seqRNA,
  seqPro,
  mode = c("prediction", "retrain", "feature"),
  retrained.model = NULL,
  label = NULL,
  positive.class = NULL,
  folds.num = 10,
  ntree = 3000,
  mtry.ratios = c(0.1, 0.2, 0.4, 0.6, 0.8),
  seed = 1,
  parallel.cores = 2,
  cl = NULL,
  ...
)

Arguments

seqRNA

RNA sequences loaded by function read.fasta from seqinr-package. Or a list of RNA/protein sequences. RNA sequences will be converted into lower case letters.

seqPro

protein sequences loaded by function read.fasta from seqinr-package. Or a list of protein sequences. Protein sequences will be converted into upper case letters. Each sequence should be a vector of single characters.

mode

a string. Set "prediction" to predict ncRNA-protein pairs and return prediction results; set "retrain" to build a new random forest model using the input data; set "feature" to return a data frame contains the extracted features. Users can use the extracted features generated by mode = "feature" to train classifiers with other machine learning algorithms. Default: "prediction".

retrained.model

(only when mode = "prediction") use the default model or a new retrained model to predict ncRNA-protein pairs? If NULL, default machine learning model will be used. Or pass the model generated by this function with parameter "mode = retrain". Default: NULL. See examples below.

label

a string or a vector of strings or NULL. Optional when mode = "prediction" or mode = "feature": used to give labels or notes to the output result. Required when mode = "retrain": must be a vector of strings that corresponds to input sequences. Each string indicates the class of each input pair. Default: NULL.

positive.class

(only when mode = "retrain") NULL or a string used to indicate which class is the positive class, Should be one of the classes in label or leave positive.class = NULL. In the latter case, the first class in label will be used as the positive class. Default: NULL.

folds.num

(only when mode = "retrain") an integer indicates the number of folds for cross validation. Default: 10 for 10-fold cross validation.

ntree

integer, number of trees to grow. See randomForest. Default: 3000.

mtry.ratios

(only when mode = "retrain") used to indicate the ratios of mtry when tuning the random forest classifier. mtry = ratio of mtry * number of features Default: c(0.1, 0.2, 0.4, 0.6, 0.8).

seed

(only when mode = "retrain") an integer indicates the random seed for data splitting.

parallel.cores

an integer that indicates the number of cores for parallel computation. Default: 2. Set parallel.cores = -1 to run with all the cores. parallel.cores should be == -1 or >= 1.

cl

parallel cores to be passed to this function.

...

(only when mode = "retrain") other parameters (except ntree and mtry) passed to randomForest function.

Value

If mode = "prediction", this function returns a data frame that contains the predicted results.

If mode = "retrain", this function returns a random forest classifier.

If mode = "feature", this function returns a data frame that contains the extracted features.

References

Han S, Yang X, Sun H, et al. LION: an integrated R package for effective prediction of ncRNA–protein interaction. Briefings in Bioinformatics. 2022; 23(6):bbac420

Examples


# Following codes only show how to use this function
# and cannot reflect the genuine performance of tools or classifiers.

data(demoPositiveSeq)
seqRNA <- demoPositiveSeq$RNA.positive
seqPro <- demoPositiveSeq$Pro.positive

# Predicting ncRNA-protein pairs:

Res_LION_1 <- run_LION(seqRNA = seqRNA, seqPro = seqPro,
                       parallel.cores = 2) # using the default setting

# the above command is equivalent to:
Res_LION_2 <- run_LION(seqRNA = seqRNA, seqPro = seqPro, mode = "prediction",
                       retrained.model = NULL, label = NULL,
                       parallel.cores = 2)

# Train a new model:

# Argument "label" which indicates the class of each input pair is required here.
# "label" should correspond to the classes of "seqRNA" and "seqPro".
# "positive.class" should be one of the classes in argument "label" or can be set as "NULL".
# In the latter case, the first label in "label" will be used as the positive class.
# Parameters of random forest, such as "replace", can be passed using "..." argument.

LION_model <- run_LION(seqRNA = seqRNA, seqPro = seqPro, mode = "retrain",
                       label = rep(c("Interact", "Non.Interact"), each = 10),
                       positive.class = NULL, folds.num = 5, ntree = 100,
                       seed = 1, parallel.cores = 2, replace = FALSE)

# Predicting using new built model by setting "retrained.model = LION_model":

Res_LION_2 <- run_LION(seqRNA = seqRNA, seqPro = seqPro, mode = "prediction",
                       retrained.model = LION_model,
                       label = rep(c("Interact", "Non.Interact"), each = 10),
                       parallel.cores = 2)

# Only extracting features:

LION_feature_df <- run_LION(seqRNA = seqRNA, seqPro = seqPro, mode = "feature",
                            label = "LION_feature", parallel.cores = 2)

# Extracted features can be used to build classifiers using other machine learning
# algorithms, which provides users with more flexibility.


HAN-Siyu/ncProR documentation built on Nov. 3, 2023, 12:08 a.m.