KNNProtein: K-Nearest Neighbor for Protein (KNNProtein)

Description Usage Arguments Value References Examples

View source: R/KNNProtein.R

Description

This function is like KNNPeptide with the difference that similarity score is computed by Needleman-Wunsch algorithm.

Usage

1
KNNProtein(seqs, trainSeq, percent = 30, labeltr = c(), label = c())

Arguments

seqs

is a fasta file with amino acids sequences. Each sequence starts with a '>' character. Also it could be a string vector such that each element is a protein sequence.

trainSeq

is a fasta file with amino acids sequences. Each sequence starts with a '>' character. Also it could be a string vector such that each element is a protein sequence. Eaxh sequence in the training set is associated with a label. The label is found in the parameret labeltr.

percent

determines the threshold which is used to identify sequences (in the training set) which are similar to the input sequence.

labeltr

This parameter is a vector whose length is equivalent to the number of sequences in the training set. It shows class of each sequence in the trainig set.

label

is an optional parameter. It is a vector whose length is equivalent to the number of sequences. It shows the class of each entry (i.e., sequence).

Value

This function returns a feature matrix such that number of columns is number of classes multiplied by percent and number of rows is equal to the number of the sequences.

References

Chen, Zhen, et al. "iFeature: a python package and web server for features extraction and selection from protein and peptide sequences." Bioinformatics 34.14 (2018): 2499-2502.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
ptmSeqsADR<-system.file("extdata/",package="ftrCOOL")
ptmSeqsVect<-as.vector(read.csv(paste0(ptmSeqsADR,"/ptmVect101AA.csv"))[,2])
ptmSeqsVect<-ptmSeqsVect[1:2]
ptmSeqsVect<-sapply(ptmSeqsVect,function(seq){substr(seq,1,31)})

posSeqs<-as.vector(read.csv(paste0(ptmSeqsADR,"/poSeqPTM101.csv"))[,2])
negSeqs<-as.vector(read.csv(paste0(ptmSeqsADR,"/negSeqPTM101.csv"))[,2])

posSeqs<-posSeqs[1:3]
negSeqs<-negSeqs[1:3]

posSeqs<-sapply(posSeqs,function(seq){substr(seq,1,31)})
negSeqs<-sapply(negSeqs,function(seq){substr(seq,1,31)})

trainSeq<-c(posSeqs,negSeqs)

labelPos<-rep(1,length(posSeqs))
labelNeg<-rep(0,length(negSeqs))

labeltr<-c(labelPos,labelNeg)

mat<-KNNProtein(seqs=ptmSeqsVect,trainSeq=trainSeq,percent=5,labeltr=labeltr)

ftrCOOL documentation built on Nov. 30, 2021, 1:07 a.m.

Related to KNNProtein in ftrCOOL...