APkNUCTri_DNA: Amphiphilic Pseudo-k Nucleotide Composition-Tri(series)...

Description Usage Arguments Details Value Examples

View source: R/APkNUCTri_DNA.R

Description

This function calculates the amphiphilic pseudo k nucleotide composition(Tri) (Series) for each sequence.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
APkNUCTri_DNA(
  seqs,
  selectedIdx = c("Dnase I", "Bendability (DNAse)"),
  lambda = 3,
  w = 0.05,
  l = 3,
  ORF = FALSE,
  reverseORF = TRUE,
  threshold = 1,
  label = c()
)

Arguments

seqs

is a FASTA file containing nucleotide sequences. The sequences start with '>'. Also, seqs could be a string vector. Each element of the vector is a nucleotide sequence.

selectedIdx

is a vector of Ids or indices of the desired physicochemical properties of trinucleotides. Users can choose the desired indices by their ids or their names in the TRI_DNA index file. The default value of the parameter is a vector with ("Dnase I", "Bendability (DNAse)") ids.

lambda

is a tuning parameter. This integer value shows the maximum limit of spaces between trinucleotide pairs. The Number of spaces changes from 1 to lambda.

w

(weight) is a tuning parameter. It changes in the range of 0 to 1. The default value is 0.05.

l

This parameter keeps the value of l in lmer composition. The lmers form the first 4^l of the APkNCTri descriptor.

ORF

(Open Reading Frame) is a logical parameter. If it is set to true, ORF region of each sequence is considered instead of the original sequence (i.e., 3-frame).

reverseORF

is a logical parameter. It is enabled only if ORF is true. If reverseORF is true, ORF region will be searched in the sequence and also in the reverse complement of the sequence (i.e., 6-frame).

threshold

is a number between (0 , 1]. In selectedIdx, indices with a correlation higher than the threshold will be deleted. The default value is 1.

label

is an optional parameter. It is a vector whose length is equivalent to the number of sequences. It shows the class of each entry (i.e., sequence).

Details

This function computes the pseudo nucleotide composition for each physicochemical property of trinucleotides. We have provided users with the ability to choose among the 12 properties in the tri-nucleotide index database.

Value

It is a feature matrix. The number of columns is 4^l+(number of the chosen indices*lambda) and the number of rows is equal to the number of sequences.

Examples

1
2
fileLNC<-system.file("extdata/Athaliana_LNCRNA.fa",package="ftrCOOL")
mat<-APkNUCTri_DNA(seqs=fileLNC,l=3,threshold=1)

ftrCOOL documentation built on Nov. 30, 2021, 1:07 a.m.