build_model: Build Users' Own Model

Description Usage Arguments Details Value References Author(s) See Also Examples

View source: R/LncFinder.R

Description

This function is used to build new models with users' own data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
build_model(
  lncRNA.seq,
  mRNA.seq,
  frequencies.file,
  SS.features = FALSE,
  lncRNA.format = "DNA",
  mRNA.format = "DNA",
  parallel.cores = 2,
  folds.num = 10,
  seed = 1,
  gamma.range = (2^seq(-5, 0, 1)),
  cost.range = c(1, 4, 8, 16, 24, 32),
  ...
)

Arguments

lncRNA.seq

Long non-coding sequences. Can be a FASTA file loaded by seqinr-package or secondary structure sequences file (Dot-Bracket Notation) obtained from function run_RNAfold. If lncRNA.seq is secondary structure sequences file, parameter lncRNA.format should be defined as "SS".

mRNA.seq

mRNA sequences. FASTA file loaded by read.fasta or secondary structure sequences (Dot-Bracket Notation) obtained from function run_RNAfold. If mRNA.seq is secondary structure sequences file, parameter mRNA.format should be defined as "SS".

frequencies.file

String or a list obtained from function make_frequencies. Input species name "human", "mouse" or "wheat" to use pre-build frequencies files. Or assign a users' own frequencies file (Please refer to function make_frequencies for more information).

SS.features

Logical. If SS.features = TRUE, secondary structure features will be used to build the model. In this case, lncRNA.seq and mRNA.seq should be secondary structure sequences (Dot-Bracket Notation) obtained from function run_RNAfold and parameter lncRNA.format and mRNA.format should be set as "SS".

lncRNA.format

String. Define the format of lncRNA.seq. "DNA" for DNA sequences and "SS" for secondary structure sequences. Only when both mRNA.format and lncRNA.format are set as "SS", can the model with secondary structure features be built (SS.features = TRUE).

mRNA.format

String. Define the format of mRNA.seq. Can be "DNA" or "SS". "DNA" for DNA sequences and "SS" for secondary structure sequences. When this parameter is defined as "DNA", only the model without secondary structure features can be built. In this case, parameter SS.features should be set as FALSE.

parallel.cores

Integer. The number of cores for parallel computation. By default the number of cores is 2, users can set as -1 to run this function with all cores. During the process of svm tuning, if the number of parallel.cores is more than the folds.num (number of the folds for cross-validation), the number of parallel.cores will be set as folds.num automatically.

folds.num

Integer. Specify the number of folds for cross-validation. (Default: 10)

seed

Integer. Used to set the seed for cross-validation. (Default: 1)

gamma.range

The range of gamma. (Default: 2 ^ seq(-5, 0, 1))

cost.range

The range of cost. (Default: c(1, 4, 8, 16, 24, 32))

...

Additional arguments passed to function svm_tune for customised SVM model training.

Details

This function is used to build a new model with users' own sequences. Users can use function lnc_finder to predict the sequences with new models.

For the details of frequencies.file, please refer to function make_frequencies.

For the details of the features, please refer to function extract_features.

For the details of svm tuning, please refer to function svm_tune.

Value

Returns a svm model.

References

Siyu Han, Yanchun Liang, Qin Ma, Yangyi Xu, Yu Zhang, Wei Du, Cankun Wang & Ying Li. LncFinder: an integrated platform for long non-coding RNA identification utilizing sequence intrinsic composition, structural information, and physicochemical property. Briefings in Bioinformatics, 2019, 20(6):2009-2027.

Author(s)

HAN Siyu

See Also

make_frequencies, lnc_finder, extract_features, svm_tune, svm.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Not run: 
data(demo_DNA.seq)
Seqs <- demo_DNA.seq

### Build the model with pre-build frequencies.file:
my_model <- build_model(lncRNA.seq = Seqs[1:5], mRNA.seq = Seqs[6:10],
                        frequencies.file = "human", SS.features = FALSE,
                        lncRNA.format = "DNA", mRNA.format = "DNA",
                        parallel.cores = 2, folds.num = 2, seed = 1,
                        gamma.range = (2 ^ seq(-5, -1, 2)),
                        cost.range = c(2, 6, 12, 20))

### Users can use default values of gamma.range and cost.range to find the
best parameters.
### Use your own frequencies file by assigning frequencies list to parameter
### "frequencies.file".

## End(Not run)

LncFinder documentation built on Dec. 11, 2021, 9:39 a.m.