MM2.Feature: Mapping nucleotide sequences onto numeric feature vectors...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/MM2.Feature.R

Description

This encoding procedure is similar to the MM1 encoding. The only difference is consideration of second order dependencies unlike first order in MM2.Feature. This technique was first conceptualized by Rajapakse and Ho (2005), and adopted by Maji and Garg (2014). The number of parameters to be estimated in MM2 is 64, which is higher than that of MM1 i.e., 16. Further, only the positive class dataset is used for encoding of sequences.

Usage

1
MM2.Feature(positive_class, test_seq)

Arguments

positive_class

Sequence dataset of the positive class, must be an object of class DNAStringSet.

test_seq

Sequences to be encoded into numeric vectors, must be an object of class DNAStringSet.

Details

For getting an object of class DNAStringSet, the FASTA sequences should be read using the function readDNAStringSet avialble in the Biostrings package.

Value

A numeric matrix of order m*(n-2), where m is the number of sequences in test_seq and n is the length of sequence.

Author(s)

Prabina Kumar Meher, Indian Agricultural Statistics Research Institute, New Delhi-110012, INDIA

References

  1. Rajapakse, J. and Ho, L.S. (2005). Markov encoding for detecting signals in genomic sequences. IEEE/ACM Trans Comput Biol Bioinf., 2(2): 131-142.

  2. Maji, S. and Garg, D. (2014). Hybrid approach using SVM and MM2 in splice site junction identification. Current Bioinformatics, 9(1): 76-85.

See Also

MM1.Feature, WAM.Feature

Examples

1
2
3
4
5
6
7
data(droso)
positive <- droso$positive
test <- droso$test
pos <- positive[1:200]
tst <- test
enc <- MM2.Feature(positive_class=pos, test_seq=tst)
enc

EncDNA documentation built on May 28, 2019, 9 a.m.

Related to MM2.Feature in EncDNA...