Description Usage Arguments Details Value Note Author(s) References See Also Examples
This encoding scheme was devised by Li et al. (2012). Frequencies of 4 nucleotides are first computed at each position for both positive and negative datasets, resulting in two 4*L probability tables for the two classes for sequence length L. A 4*L statistical difference table is obtained by elementwise substraction of the two probability distribution tables, which is then used for encoding of sequences. Further, as per sparse encoding, the nucleotides A, T, G and C can be encoded as (1,0,0,0), (0,1,0,0), (0,0,1,0) and (0,0,0,1) respectively. The value 1 of sparse encoding is then replaced with the difference values obtained from the difference table for encoding nucleotide at each postion. Thus, it can be said that POS feature encoding is a blending of MN-FDTF (Huang et al., 2006) and Sparse encoding (Meher et al., 2016) technique.
1 | POS.Feature(positive_class, negative_class, test_seq)
|
positive_class |
Sequence dataset of the positive class, must be an object of class |
negative_class |
Sequence dataset of the negative class, must be an object of class |
test_seq |
Sequences to be encoded into numeric vectors, must be an object of class |
The DNAstringSet
object can be obtained by reading the sequences in FASTA format using the function readDNAStringSetavailable in the Biostrings package of Bioconductor.
A numeric matrix of order m*4n, where m is the number of sequences in test_seq
and n is the length of sequence.
In this encoding procedure, dependencies of nucleotides are not taken into consideration. Both positive and negative datasets are required for encoding of nucleotide sequences. Each sequence of length L can be transformed into a numeric vector of length 4*L with this encoding technique.
Prabina Kumar Meher, Indian Agricultural Statistics Research Institute, New Delhi-110012, INDIA
Huang, J., Li, T., Chen, K. and Wu, J. (2006). An approach of encoding for prediction of splice sites using SVM. Biochimie, 88(7): 923-929.
Li, J.L., Wang, L.F., Wang, H.Y., Bai, L.Y., Yuan, Z.M. (2012). High-accuracy splice sites prediction based on sequence component and position features. Genetics and Molecular Research, 11(3): 3432-3451.
Meher, P.K., Sahu, T.K., Rao, A.R. and Wahi, S.D. (2016). A computational approach for prediction of donor splice sites with improved accuracy. Journal of Theoretical Biology, 404: 285-294.
MN.Fdtf.Feature
, Bayes.Feature
, WMM.Feature
1 2 3 4 5 6 7 8 9 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.