POS.Feature: Transformation of nucleic acid sequences into numeric vectors...
In EncDNA: Encoding of Nucleotide Sequences into Numeric Feature Vectors

Description Usage Arguments Details Value Note Author(s) References See Also Examples

This encoding scheme was devised by Li et al. (2012). Frequencies of 4 nucleotides are first computed at each position for both positive and negative datasets, resulting in two 4*L probability tables for the two classes for sequence length L. A 4*L statistical difference table is obtained by elementwise substraction of the two probability distribution tables, which is then used for encoding of sequences. Further, as per sparse encoding, the nucleotides A, T, G and C can be encoded as (1,0,0,0), (0,1,0,0), (0,0,1,0) and (0,0,0,1) respectively. The value 1 of sparse encoding is then replaced with the difference values obtained from the difference table for encoding nucleotide at each postion. Thus, it can be said that POS feature encoding is a blending of MN-FDTF (Huang et al., 2006) and Sparse encoding (Meher et al., 2016) technique.

1	POS.Feature(positive_class, negative_class, test_seq)

`positive_class`	Sequence dataset of the positive class, must be an object of class `DNAStringSet`.
`negative_class`	Sequence dataset of the negative class, must be an object of class `DNAStringSet`.
`test_seq`	Sequences to be encoded into numeric vectors, must be an object of class `DNAStringSet`.

The DNAstringSet object can be obtained by reading the sequences in FASTA format using the function readDNAStringSetavailable in the Biostrings package of Bioconductor.

A numeric matrix of order m*4n, where m is the number of sequences in test_seq and n is the length of sequence.

In this encoding procedure, dependencies of nucleotides are not taken into consideration. Both positive and negative datasets are required for encoding of nucleotide sequences. Each sequence of length L can be transformed into a numeric vector of length 4*L with this encoding technique.

Prabina Kumar Meher, Indian Agricultural Statistics Research Institute, New Delhi-110012, INDIA

Huang, J., Li, T., Chen, K. and Wu, J. (2006). An approach of encoding for prediction of splice sites using SVM. Biochimie, 88(7): 923-929.
Li, J.L., Wang, L.F., Wang, H.Y., Bai, L.Y., Yuan, Z.M. (2012). High-accuracy splice sites prediction based on sequence component and position features. Genetics and Molecular Research, 11(3): 3432-3451.
Meher, P.K., Sahu, T.K., Rao, A.R. and Wahi, S.D. (2016). A computational approach for prediction of donor splice sites with improved accuracy. Journal of Theoretical Biology, 404: 285-294.

MN.Fdtf.Feature, Bayes.Feature, WMM.Feature

data(droso)
positive <- droso$positive
negative <- droso$negative
test <- droso$test
pos <- positive[1:200]
neg <- negative[1:200]
tst <- test
enc <- POS.Feature(positive_class=pos, negative_class=neg, test_seq=tst)
enc

EncDNA documentation built on May 28, 2019, 9 a.m.

EncDNA index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

EncDNA
Encoding of Nucleotide Sequences into Numeric Feature Vectors

POS.Feature: Transformation of nucleic acid sequences into numeric vectors...
In EncDNA: Encoding of Nucleotide Sequences into Numeric Feature Vectors

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to POS.Feature in EncDNA...

R Package Documentation

Browse R Packages

We want your feedback!

EncDNA Encoding of Nucleotide Sequences into Numeric Feature Vectors

POS.Feature: Transformation of nucleic acid sequences into numeric vectors... In EncDNA: Encoding of Nucleotide Sequences into Numeric Feature Vectors

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to POS.Feature in EncDNA...

R Package Documentation

Browse R Packages

We want your feedback!

EncDNA
Encoding of Nucleotide Sequences into Numeric Feature Vectors

POS.Feature: Transformation of nucleic acid sequences into numeric vectors...
In EncDNA: Encoding of Nucleotide Sequences into Numeric Feature Vectors