APkNUCdi_RNA: Amphiphilic Pseudo-k riboNucleotide Composition-di(series)...

Description Usage Arguments Details Value Examples

View source: R/APkNUCdi_RNA.R

Description

This function calculates the amphiphilic pseudo k ribonucleotide composition(Di) (Series) for each sequence.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
APkNUCdi_RNA(
  seqs,
  selectedIdx = c("Rise (RNA)", "Roll (RNA)", "Shift (RNA)", "Slide (RNA)",
    "Tilt (RNA)", "Twist (RNA)"),
  lambda = 3,
  w = 0.05,
  l = 2,
  ORF = FALSE,
  reverseORF = TRUE,
  threshold = 1,
  label = c()
)

Arguments

seqs

is a FASTA file containing ribonucleotide sequences. The sequences start with '>'. Also, seqs could be a string vector. Each element of the vector is a ribonucleotide sequence.

selectedIdx

is a vector of Ids or indices of the desired physicochemical properties of di-ribonucleotides. Users can choose the desired indices by their ids or their names in the DI_RNA index file. The default value of this parameter is a vector with ("Rise (RNA)", "Roll (RNA)", "Shift (RNA)", "Slide (RNA)", "Tilt (RNA)","Twist (RNA)") ids.

lambda

is a tuning parameter. This integer value shows the maximum limit of spaces between di-ribonucleotide pairs. The Number of spaces changes from 1 to lambda.

w

(weight) is a tuning parameter. It changes in the range of 0 to 1. The default value is 0.05.

l

This parameter keeps the value of l in lmer composition. The lmers form the first 4^l elements of the APkNCdi descriptor.

ORF

(Open Reading Frame) is a logical parameter. If it is set to true, ORF region of each sequence is considered instead of the original sequence (i.e., 3-frame).

reverseORF

is a logical parameter. It is enabled only if ORF is true. If reverseORF is true, ORF region will be searched in the sequence and also in the reverse complement of the sequence (i.e., 6-frame).

threshold

is a number between (0 to 1]. In selectedIdx, indices with a correlation higher than the threshold will be deleted. The default value is 1.

label

is an optional parameter. It is a vector whose length is equivalent to the number of sequences. It shows the class of each entry (i.e., sequence).

Details

This function computes the pseudo ribonucleotide composition for each physicochemical property of di-ribonucleotides. We have provided users with the ability to choose among the 22 properties in the di-ribonucleotide index database.

Value

It is a feature matrix. The number of columns is 4^l+(number of the chosen indices*lambda) and the number of rows is equal to the number of sequences.

Examples

1
2
fileLNC<-system.file("extdata/Carica_papaya101RNA.txt",package="ftrCOOL")
mat<-APkNUCdi_RNA(seqs=fileLNC,ORF=TRUE,threshold=0.8)

ftrCOOL documentation built on Nov. 30, 2021, 1:07 a.m.