PCPseDNC: Parallel Correlation Pseudo Dinucleotide Composition...

Description Usage Arguments Details Value Examples

View source: R/PCPseDNC.R

Description

This function works like PSEkNUCdi_DNA except that the default value of selectedIdx parameter is different.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
PCPseDNC(
  seqs,
  selectedIdx = c("Base stacking", "Protein induced deformability", "B-DNA twist",
    "A-philicity", "Propeller twist", "Duplex stability:(freeenergy)",
    "DNA denaturation", "Bending stiffness", "Protein DNA twist", "Aida_BA_transition",
    "Breslauer_dG", "Breslauer_dH", "Electron_interaction", "Hartman_trans_free_energy",
    "Helix-Coil_transition", "Lisser_BZ_transition", "Polar_interaction",
    "SantaLucia_dG", "SantaLucia_dS", "Sarai_flexibility", "Stability", "Sugimoto_dG",
    "Sugimoto_dH", "Sugimoto_dS", "Duplex tability(disruptenergy)",     
    "Stabilising energy of Z-DNA", "Breslauer_dS", "Ivanov_BA_transition",
    "SantaLucia_dH", "Stacking_energy", "Watson-Crick_interaction",
    "Dinucleotide GC Content", "Rise", "Roll", "Shift", "Slide", "Tilt", "Twist"),
  lambda = 3,
  w = 0.05,
  l = 2,
  ORF = FALSE,
  reverseORF = TRUE,
  threshold = 1,
  label = c()
)

Arguments

seqs

is a FASTA file containing nucleotide sequences. The sequences start with '>'. Also, seqs could be a string vector. Each element of the vector is a nucleotide sequence.

selectedIdx

is a vector of Ids or indices of the desired physicochemical properties of dinucleotides. Users can choose the desired indices by their ids or their names in the DI_DNA index file. Default value of this parameter is a vector with ("Base stacking","Protein induced deformability","B-DNA twist","A-philicity", "Propeller twist","Duplex stability:(freeenergy)","DNA denaturation","Bending stiffness", "Protein DNA twist","Aida_BA_transition","Breslauer_dG","Breslauer_dH","Electron_interaction", "Hartman_trans_free_energy","Helix-Coil_transition","Lisser_BZ_transition","Polar_interaction", "SantaLucia_dG","SantaLucia_dS","Sarai_flexibility","Stability","Sugimoto_dG", "Sugimoto_dH","Sugimoto_dS","Duplex tability(disruptenergy)","Stabilising energy of Z-DNA", "Breslauer_dS","Ivanov_BA_transition","SantaLucia_dH","Stacking_energy","Watson-Crick_interaction","Dinucleotide GC Content", "Rise", "Roll", "Shift", "Slide", "Tilt", "Twist") entries.

lambda

is a tuning parameter. This integer value shows the maximum limit of spaces between dinucleotide pairs. The Number of spaces changes from 1 to lambda.

w

(weight) is a tuning parameter. It changes in the range of 0 to 1. The default value is 0.05.

l

This parameter keeps the value of l in lmer composition. The lmers form the first 4^l elements of the APkNCdi descriptor.

ORF

(Open Reading Frame) is a logical parameter. If it is set to true, ORF region of each sequence is considered instead of the original sequence (i.e., 3-frame).

reverseORF

is a logical parameter. It is enabled only if ORF is true. If reverseORF is true, ORF region will be searched in the sequence and also in the reverse complement of the sequence (i.e., 6-frame).

threshold

is a number between (0 , 1]. In selectedIdx, indices with a correlation higher than the threshold will be deleted. The default value is 1.

label

is an optional parameter. It is a vector whose length is equivalent to the number of sequences. It shows the class of each entry (i.e., sequence).

Details

This function computes the pseudo nucleotide composition for each physicochemical property of di-nucleotides. We have provided users with the ability to choose among the 148 properties in the di-nucleotide index database.

Value

a feature matrix such that the number of columns is 4^l+lambda and the number of rows is equal to the number of sequences.

Examples

1
2
fileLNC<-system.file("extdata/Athaliana_LNCRNA.fa",package="ftrCOOL")
mat<-PSEkNUCdi_DNA(seqs=fileLNC,l=2,ORF=TRUE,threshold=0.8)

ftrCOOL documentation built on Nov. 30, 2021, 1:07 a.m.

Related to PCPseDNC in ftrCOOL...