Description Usage Arguments Details Value Author(s) References See Also Examples
This function can be used to map the sequence dataset onto numeric feature vectors, based on gap pair composition features. This function requires the barcode sequences in DNAString format and the species label of each sequence as factor. The resultant output can be directly used as input to train the random forest based prediction model.
1 | seq_funbarRF (reference_seq, seq_id)
|
reference_seq |
Barcode sequences of class DNAStringSet. It can also be an object generated using the function |
seq_id |
A vctor of species labels as factor. The length of the vector must be equal to the number of sequences in reference_seq. |
For the argument seq_id, user has to supply the species label for each sequence in the specified format. For example, the species label Absidia caerulea should written as Absidia_caerulea. The class of the seq_id
must be of factor type.
ref_label |
Species labels of barcode sequences as factor. |
ref_gpc |
A matrix of dimension N*96, where N is the number of sequences and 96 columns represent the gap pair composition features for 0, 1, 2, 3, 4 and 5 gaps together. |
Prabina Kumar Meher, Division of Statistical Genetics,Indian Agricultural Statistics Research Institute, New Delhi-110012, INDIA
Yu C.S., Chen Y.C., Lu C.H., and Hwang J.K. (2006). Prediction of protein subcellular localization. Proteins, 64(3), 643-651.
Meher P.K., Sahu T.K., Gahoi S., and Rao A.R. (2018). ir-HSP: Improved recognition of heat shock proteins, their families and sub-types based on g-spaced di-peptide features and support vector machine. Front. Genet., 8, 235.
Li H. (2016). BioSeqClass: Classification for biological Sequences. R package version 1.32.0.
seq_funbarRF_manual
, featureGapPairComposition
1 2 3 4 5 6 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.