Sparse.Feature: Nucleotide sequence encoding with 0 and 1.

Description Usage Arguments Details Value Note Author(s) References Examples

View source: R/Sparse.Feature.R

Description

In this encoding approach A, T, G and C are encoded as (1,1,1), (1,0,0), (0,1,0) and (0,0,1). This was introduced by Golam Bari et al. (2014). Besides, each nucleotide can also be encoded with four bits i.e., A as (1,0,0,0), T as (0,1,0,0), G as (0,0,1,0) and C as (0,0,0,1) as followed in Meher et al. (2016).

Usage

1
Sparse.Feature(test_seq)

Arguments

test_seq

Sequence dataset to be encoded into numeric vector containing 0 and 1, must be an object of class DNAStringSet.

Details

Each sequence is encoded independently, without the need of positive and negative classes datasets.

Value

A vector of length 4*n for sequence of n nucleotides long in test_seq.

Note

For larger sequence length, high dimensional feature vector will be generated.

Author(s)

Prabina Kumar Meher, Indian Agricultural Statistics Research Institute, New Delhi-110012, INDIA

References

  1. Bari, A.T.M.G., Reaz, M.R. and Jeong, B.S. (2014). Effective DNA encoding for splice site prediction using SVM. MATCH Commun. Math. Comput. Chem., 71: 241-258.

  2. Meher, P.K., Sahu, T.K., Rao, A.R. and Wahi, S.D. (2016). A computational approach for prediction of donor splice sites with improved accuracy. Journal of Theoretical Biology, 404: 285-294.

Examples

1
2
3
4
5
data(droso)
test <- droso$test
tst <- test
enc <- Sparse.Feature(test_seq=tst)
enc

EncDNA documentation built on May 28, 2019, 9 a.m.