run_LncADeep: Predict RNA-Protein Interaction Using LncADeep's Features

View source: R/Methods.R

run_LncADeepR Documentation

Predict RNA-Protein Interaction Using LncADeep's Features

Description

This function can predict lncRNA/RNA-protein interactions using rebuilt model trained with LncADeep's feature set. Model retraining and feature extraction are also supported. LncADeep selects 110 features to build its classifier. Here, the 110 top features are determined by averaging feature scores of 33 evaluation results provided by LncADeep. LncADeep's original model is trained using deep neural network (DNN). Considering that DNN architecture is hard to perform parameter tuning, we rebuild the model using the same machine algorithm (random forest) as the other methods. Users can build DNN model with the features generated by this function.

Usage

run_LncADeep(
  seqRNA,
  seqPro,
  mode = c("prediction", "retrain", "feature"),
  retrained.model = NULL,
  label = NULL,
  positive.class = NULL,
  folds.num = 10,
  ntree = 3000,
  mtry.ratios = c(0.1, 0.2, 0.4, 0.6, 0.8),
  seed = 1,
  parallel.cores = 2,
  cl = NULL,
  ...
)

Arguments

seqRNA

RNA sequences loaded by function read.fasta from seqinr-package. Or a list of RNA/protein sequences. RNA sequences will be converted into lower case letters.

seqPro

protein sequences loaded by function read.fasta from seqinr-package. Or a list of protein sequences. Protein sequences will be converted into upper case letters. Each sequence should be a vector of single characters.

mode

a string. Set "prediction" to predict ncRNA-protein pairs and return prediction results; set "retrain" to build a new random forest model using the input data; set "feature" to return a data frame contains the extracted features. Users can use the extracted features generated by mode = "feature" to train classifiers with other machine learning algorithms. Default: "prediction".

retrained.model

(only when mode = "prediction") use the default model or a new retrained model to predict ncRNA-protein pairs? If NULL, default machine learning model will be used. Or pass the model generated by this function with parameter "mode = retrain". Default: NULL. See examples below.

label

a string or a vector of strings or NULL. Optional when mode = "prediction" or mode = "feature": used to give labels or notes to the output result. Required when mode = "retrain": must be a vector of strings that corresponds to input sequences. Each string indicates the class of each input pair. Default: NULL.

positive.class

(only when mode = "retrain") NULL or a string used to indicate which class is the positive class, Should be one of the classes in label or leave positive.class = NULL. In the latter case, the first class in label will be used as the positive class. Default: NULL.

folds.num

(only when mode = "retrain") an integer indicates the number of folds for cross validation. Default: 10 for 10-fold cross validation.

ntree

integer, number of trees to grow. See randomForest. Default: 3000.

mtry.ratios

(only when mode = "retrain") used to indicate the ratios of mtry when tuning the random forest classifier. mtry = ratio of mtry * number of features Default: c(0.1, 0.2, 0.4, 0.6, 0.8).

seed

(only when mode = "retrain") an integer indicates the random seed for data splitting.

parallel.cores

an integer that indicates the number of cores for parallel computation. Default: 2. Set parallel.cores = -1 to run with all the cores. parallel.cores should be == -1 or >= 1.

cl

parallel cores to be passed to this function.

...

(only when mode = "retrain") other parameters (except ntree and mtry) passed to randomForest function.

Value

If mode = "prediction", this function returns a data frame that contains the predicted results.

If mode = "retrain", this function returns a random forest classifier.

If mode = "feature", this function returns a data frame that contains the extracted features.

References

Yang C, Yang L, Zhou M, et al. LncADeep: an ab initio lncRNA identification and functional annotation tool based on deep learning. Bioinformatics. 2018; 34(22):3825-3834.

Examples


# Following codes only show how to use this function
# and cannot reflect the genuine performance of tools or classifiers.

data(demoPositiveSeq)
seqRNA <- demoPositiveSeq$RNA.positive
seqPro <- demoPositiveSeq$Pro.positive

# Predicting ncRNA-protein pairs:

Res_LncADeep_1 <- run_LncADeep(seqRNA = seqRNA, seqPro = seqPro, mode = "prediction",
                               retrained.model = NULL, label = "LncADeep_res",
                               parallel.cores = 2) # using default rebuilt model

# Train a new model:

# Argument "label" which indicates the class of each input pair is required here.
# "label" should correspond to the classes of "seqRNA" and "seqPro".
# "positive.class" should be one of the classes in argument "label" or can be set as "NULL".
# In the latter case, the first label in "label" will be used as the positive class.
# Parameters of random forest, such as "nodesize", can be passed using "..." argument.

LncADeep_model <- run_LncADeep(seqRNA = seqRNA, seqPro = seqPro, mode = "retrain",
                               label = rep(c("Interact", "Non.Interact"), each = 10),
                               positive.class = NULL, folds.num = 5, ntree = 100,
                               seed = 1, parallel.cores = 2, nodesize = 2)

# Predicting using new built model by setting "retrained.model = LncADeep_model":

Res_LncADeep_2 <- run_LncADeep(seqRNA = seqRNA, seqPro = seqPro, mode = "prediction",
                               retrained.model = LncADeep_model, label = NULL,
                               parallel.cores = 2)

# Only extracting features:

LncADeep_feature_df <- run_LncADeep(seqRNA = seqRNA, seqPro = seqPro, mode = "feature",
                                    label = "feature", parallel.cores = 2)

# Extracted features can be used to build classifiers using other machine learning
# algorithms, which provides users with more flexibility.


HAN-Siyu/ncProR documentation built on Nov. 3, 2023, 12:08 a.m.