knnVCN: k-Nearest Neighbour Classification of Versatile Distance...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

k-nearest neighbour classification of versatile Distance version for test set from training set. For each row of the test set, the k nearest (in multiple distances) training set vectors are found, and the classification is decided by majority vote. This function allows you measure the distance bewteen vectors by six different means. K Threshold Value Check and Same K_i Problem Dealing are also been considered.

Usage

1
knnVCN(TrnX, OrigTrnG, TstX, K = 1, ShowObs=F,method = "euclidean",p =2)

Arguments

TrnX

matrix or data frame of training set cases.

OrigTrnG

matrix or data frame of true classifications of training set.

TstX

matrix or data frame of test set cases. A vector will be interpreted as a row vector for a single case.

K

number of top K nearest neighbours considered.

ShowObs

logical, when it's ture, the funtion will output the imformation of training set cases.

method

the distance measure to be used. This must be one of "euclidean", "maximum", "manhattan","canberra", "binary" or "minkowski". Any unambiguous substring can be given.

p

The power of the Minkowski distance.

Details

K Threshold Value is stipulated to be less than the minimum size of the class in training set, or a warning will be shown.

Sometimes a case may get same "ballot" from class A and class B (even C, D, ...), this time a weighted voting process will be activated. The weight is based on the actual distance calculated between the test case and K cases in neighbor A and B. The test case belongs to the class with less total distance.

The multiple distances are implemented by transfering the function dist(). For the convenience of users, we quote the details of function "dist()" and show them here.

Available distance measures are :

euclidean: Usual square distance between the two vectors (2 norm).

maximum: Maximum distance between two components of x and y (supremum norm)

manhattan: Absolute distance between the two vectors (1 norm).

canberra: sum(abs(Xi-Yi)/abs(Xi+Yi)) Terms with zero numerator and denominator are omitted from the sum and treated as if the values were missing.

This is intended for non-negative values (e.g. counts): taking the absolute value of the denominator is a 1998 R modification to avoid negative distances.

binary: (aka asymmetric binary): The vectors are regarded as binary bits, so non-zero elements are "on" and zero elements are "off". The distance is the proportion of bits in which only one is on amongst those in which at least one is on.

minkowski: The p norm, the pth root of the sum of the pth powers of the differences of the components.

Missing values are allowed, and are excluded from all computations involving the rows within which they occur. Further, when Inf values are involved, all pairs of values are excluded when their contribution to the distance gave NaN or NA. If some columns are excluded in calculating a Euclidean, Manhattan, Canberra or Minkowski distance, the sum is scaled up proportionally to the number of columns used. If all pairs are excluded when calculating a particular distance, the value is NA.

Value

result of classifications of test set will be returned. (When TstX is NULL, the function will automatically consider the user is trying to test the knn algorithm. Hence, a test result table and accuracy report will be shown on the R-console.)

Note

If you want to use the distance measure "binary", the vectors must be binary bits, non-zero elements are "on" and zero elements are "off".

Author(s)

Xinmiao Wang

References

Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge.

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

See Also

knnMCN, dataFiller

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
library(knnGarden)
data(iris)
## Define data
TrnX<-iris[,1:4]
OrigTrnG<-iris[,5]
#
TstX<-iris[c(1:20,50:70,120:140),1:4]
#or
TstX<-NULL
## Call function
knnVCN(TrnX=TrnX,OrigTrnG=OrigTrnG,TstX=TstX,ShowObs=FALSE,K=5,method="euclidean",p = 2)


## The function is currently defined as
function (TrnX, OrigTrnG, TstX = NULL, K = 1, ShowObs = F, method = "euclidean", 
    p = 2) 
{
    OrigTrnG = as.factor(OrigTrnG)
    TrnG = as.numeric(OrigTrnG)
    CodeMeaning = data.frame(TrnG, OrigTrnG)
    TK = sort(as.matrix(table(TrnG)), decreasing = F)
    if (K > TK[1]) {
        stop(c("\nNOTES: \nsorry, the value of K ", "(K=", K, 
            ") ", "you have selected is bigger than the capacity of one class in your training data set", 
            "(", "the capacity is ", TK[1], ")", ",", "please choose a less value for K"))
    }
    if (is.null(TstX) == T) {
        IsTst = 1
        TstX <- as.matrix(TrnX)
    }
    else {
        IsTst = 0
    }
    if (is.matrix(TstX) == F) {
        TstX <- as.matrix(TstX)
    }
    TrnX <- as.matrix(TrnX)
    ElmTrnG = union(TrnG, TrnG)
    LevTrnG = length(ElmTrnG)
    TrnTotal = cbind(TrnG, TrnX)
    NTstX = nrow(TstX)
    NTrnTotal = nrow(TrnTotal)
    VoteResult = NULL
    VoteResultList = NULL
    for (i in 1:nrow(TstX)) {
        RankBoardI <- NULL
        RankBoardIJ <- NULL
        Total = rbind(TstX[i, ], TrnX)
        RankBoardI = as.matrix(dist(Total, method = method, p = p)[1:nrow(TrnX)])
        RankBoardIJ = cbind(TrnG, RankBoardI)
        VoteAndWeight = RankBoardIJ[sort(RankBoardIJ[, 2], index.return = T)$ix[1:K], 
            1:2]
        TempVote4TstXI = RankBoardIJ[sort(RankBoardIJ[, 2], index.return = T)$ix[1:K], 
            1]
        ElmVote = union(TempVote4TstXI, TempVote4TstXI)
        CountVote = as.matrix(sort(table(TempVote4TstXI), decreasing = T))
        TempWinner = as.numeric(rownames(CountVote))
        if (length(CountVote) == 1 | K == 1) {
            Winner = TempWinner[1]
            TstXIBelong = union(CodeMeaning$OrigTrnG[which(CodeMeaning$TrnG == 
                Winner)], CodeMeaning$OrigTrnG[which(CodeMeaning$TrnG == 
                Winner)])
            VoteResultNode = data.frame(TstXIBelong)
            VoteResultList = rbind(VoteResultList, VoteResultNode)
        }
        else {
            NumOfTie = CountVote[1]
            FinalList = NULL
            j = 1
            TempWeight = sum(VoteAndWeight[which(VoteAndWeight[, 
                1] == TempWinner[j]), 2])
            FinalList = data.frame(TempWinner[j], TempWeight)
            while (CountVote[j] == CountVote[j + 1] & j < length(CountVote)) {
                TempWeight = sum(VoteAndWeight[which(VoteAndWeight[, 
                  1] == TempWinner[j + 1]), 2])
                FinalListNode = c(TempWinner[j + 1], TempWeight)
                FinalList = rbind(FinalList, FinalListNode)
                j = j + 1
            }
            FinalList = FinalList[sort(FinalList$TempWeight, 
                index.return = T)$ix[1], ]
            TstXIBelong = union(CodeMeaning$OrigTrnG[which(CodeMeaning$TrnG == 
                FinalList[1, 1])], CodeMeaning$OrigTrnG[which(CodeMeaning$TrnG == 
                FinalList[1, 1])])
            VoteResultNode = data.frame(TstXIBelong)
            VoteResultList = rbind(VoteResultList, VoteResultNode)
        }
    }
    if (IsTst == 1) {
        CheckT = as.matrix(table(data.frame(VoteResultList, OrigTrnG)))
        AccuStat = 1 - sum(CheckT - diag(diag(CheckT)))/length(TrnG)
        print(CheckT)
        cat("the classification accuracy of this algorithm on this training dataset is: ", 
            AccuStat * 100, "%", "\n\n\n")
    }
    if (IsTst == 1 & ShowObs == F) {
        result = data.frame(VoteResultList, OrigTrnG)
    }
    else {
        if (IsTst == 1 & ShowObs == T) {
            result = data.frame(TstX, VoteResultList, OrigTrnG)
        }
        else {
            if (ShowObs == F) {
                result = data.frame(VoteResultList)
            }
            else {
                result = data.frame(TstX, VoteResultList)
            }
        }
    }
    return(result)
  }

Example output

Loading required package: cluster
            OrigTrnG
TstXIBelong  setosa versicolor virginica
  setosa         50          0         0
  versicolor      0         47         2
  virginica       0          3        48
the classification accuracy of this algorithm on this training dataset is:  96.66667 % 


    TstXIBelong   OrigTrnG
1        setosa     setosa
2        setosa     setosa
3        setosa     setosa
4        setosa     setosa
5        setosa     setosa
6        setosa     setosa
7        setosa     setosa
8        setosa     setosa
9        setosa     setosa
10       setosa     setosa
11       setosa     setosa
12       setosa     setosa
13       setosa     setosa
14       setosa     setosa
15       setosa     setosa
16       setosa     setosa
17       setosa     setosa
18       setosa     setosa
19       setosa     setosa
20       setosa     setosa
21       setosa     setosa
22       setosa     setosa
23       setosa     setosa
24       setosa     setosa
25       setosa     setosa
26       setosa     setosa
27       setosa     setosa
28       setosa     setosa
29       setosa     setosa
30       setosa     setosa
31       setosa     setosa
32       setosa     setosa
33       setosa     setosa
34       setosa     setosa
35       setosa     setosa
36       setosa     setosa
37       setosa     setosa
38       setosa     setosa
39       setosa     setosa
40       setosa     setosa
41       setosa     setosa
42       setosa     setosa
43       setosa     setosa
44       setosa     setosa
45       setosa     setosa
46       setosa     setosa
47       setosa     setosa
48       setosa     setosa
49       setosa     setosa
50       setosa     setosa
51   versicolor versicolor
52   versicolor versicolor
53   versicolor versicolor
54   versicolor versicolor
55   versicolor versicolor
56   versicolor versicolor
57   versicolor versicolor
58   versicolor versicolor
59   versicolor versicolor
60   versicolor versicolor
61   versicolor versicolor
62   versicolor versicolor
63   versicolor versicolor
64   versicolor versicolor
65   versicolor versicolor
66   versicolor versicolor
67   versicolor versicolor
68   versicolor versicolor
69   versicolor versicolor
70   versicolor versicolor
71    virginica versicolor
72   versicolor versicolor
73    virginica versicolor
74   versicolor versicolor
75   versicolor versicolor
76   versicolor versicolor
77   versicolor versicolor
78   versicolor versicolor
79   versicolor versicolor
80   versicolor versicolor
81   versicolor versicolor
82   versicolor versicolor
83   versicolor versicolor
84    virginica versicolor
85   versicolor versicolor
86   versicolor versicolor
87   versicolor versicolor
88   versicolor versicolor
89   versicolor versicolor
90   versicolor versicolor
91   versicolor versicolor
92   versicolor versicolor
93   versicolor versicolor
94   versicolor versicolor
95   versicolor versicolor
96   versicolor versicolor
97   versicolor versicolor
98   versicolor versicolor
99   versicolor versicolor
100  versicolor versicolor
101   virginica  virginica
102   virginica  virginica
103   virginica  virginica
104   virginica  virginica
105   virginica  virginica
106   virginica  virginica
107  versicolor  virginica
108   virginica  virginica
109   virginica  virginica
110   virginica  virginica
111   virginica  virginica
112   virginica  virginica
113   virginica  virginica
114   virginica  virginica
115   virginica  virginica
116   virginica  virginica
117   virginica  virginica
118   virginica  virginica
119   virginica  virginica
120  versicolor  virginica
121   virginica  virginica
122   virginica  virginica
123   virginica  virginica
124   virginica  virginica
125   virginica  virginica
126   virginica  virginica
127   virginica  virginica
128   virginica  virginica
129   virginica  virginica
130   virginica  virginica
131   virginica  virginica
132   virginica  virginica
133   virginica  virginica
134   virginica  virginica
135   virginica  virginica
136   virginica  virginica
137   virginica  virginica
138   virginica  virginica
139   virginica  virginica
140   virginica  virginica
141   virginica  virginica
142   virginica  virginica
143   virginica  virginica
144   virginica  virginica
145   virginica  virginica
146   virginica  virginica
147   virginica  virginica
148   virginica  virginica
149   virginica  virginica
150   virginica  virginica
function (TrnX, OrigTrnG, TstX = NULL, K = 1, ShowObs = F, method = "euclidean", 
    p = 2) 
{
    OrigTrnG = as.factor(OrigTrnG)
    TrnG = as.numeric(OrigTrnG)
    CodeMeaning = data.frame(TrnG, OrigTrnG)
    TK = sort(as.matrix(table(TrnG)), decreasing = F)
    if (K > TK[1]) {
        stop(c("\\nNOTES: \\nsorry, the value of K ", "(K=", K, 
            ") ", "you have selected is bigger than the capacity of one class in your training data set", 
            "(", "the capacity is ", TK[1], ")", ",", "please choose a less value for K"))
    }
    if (is.null(TstX) == T) {
        IsTst = 1
        TstX <- as.matrix(TrnX)
    }
    else {
        IsTst = 0
    }
    if (is.matrix(TstX) == F) {
        TstX <- as.matrix(TstX)
    }
    TrnX <- as.matrix(TrnX)
    ElmTrnG = union(TrnG, TrnG)
    LevTrnG = length(ElmTrnG)
    TrnTotal = cbind(TrnG, TrnX)
    NTstX = nrow(TstX)
    NTrnTotal = nrow(TrnTotal)
    VoteResult = NULL
    VoteResultList = NULL
    for (i in 1:nrow(TstX)) {
        RankBoardI <- NULL
        RankBoardIJ <- NULL
        Total = rbind(TstX[i, ], TrnX)
        RankBoardI = as.matrix(dist(Total, method = method, p = p)[1:nrow(TrnX)])
        RankBoardIJ = cbind(TrnG, RankBoardI)
        VoteAndWeight = RankBoardIJ[sort(RankBoardIJ[, 2], index.return = T)$ix[1:K], 
            1:2]
        TempVote4TstXI = RankBoardIJ[sort(RankBoardIJ[, 2], index.return = T)$ix[1:K], 
            1]
        ElmVote = union(TempVote4TstXI, TempVote4TstXI)
        CountVote = as.matrix(sort(table(TempVote4TstXI), decreasing = T))
        TempWinner = as.numeric(rownames(CountVote))
        if (length(CountVote) == 1 | K == 1) {
            Winner = TempWinner[1]
            TstXIBelong = union(CodeMeaning$OrigTrnG[which(CodeMeaning$TrnG == 
                Winner)], CodeMeaning$OrigTrnG[which(CodeMeaning$TrnG == 
                Winner)])
            VoteResultNode = data.frame(TstXIBelong)
            VoteResultList = rbind(VoteResultList, VoteResultNode)
        }
        else {
            NumOfTie = CountVote[1]
            FinalList = NULL
            j = 1
            TempWeight = sum(VoteAndWeight[which(VoteAndWeight[, 
                1] == TempWinner[j]), 2])
            FinalList = data.frame(TempWinner[j], TempWeight)
            while (CountVote[j] == CountVote[j + 1] & j < length(CountVote)) {
                TempWeight = sum(VoteAndWeight[which(VoteAndWeight[, 
                  1] == TempWinner[j + 1]), 2])
                FinalListNode = c(TempWinner[j + 1], TempWeight)
                FinalList = rbind(FinalList, FinalListNode)
                j = j + 1
            }
            FinalList = FinalList[sort(FinalList$TempWeight, 
                index.return = T)$ix[1], ]
            TstXIBelong = union(CodeMeaning$OrigTrnG[which(CodeMeaning$TrnG == 
                FinalList[1, 1])], CodeMeaning$OrigTrnG[which(CodeMeaning$TrnG == 
                FinalList[1, 1])])
            VoteResultNode = data.frame(TstXIBelong)
            VoteResultList = rbind(VoteResultList, VoteResultNode)
        }
    }
    if (IsTst == 1) {
        CheckT = as.matrix(table(data.frame(VoteResultList, OrigTrnG)))
        AccuStat = 1 - sum(CheckT - diag(diag(CheckT)))/length(TrnG)
        print(CheckT)
        cat("the classification accuracy of this algorithm on this training dataset is: ", 
            AccuStat * 100, "%", "\\n\\n\\n")
    }
    if (IsTst == 1 & ShowObs == F) {
        result = data.frame(VoteResultList, OrigTrnG)
    }
    else {
        if (IsTst == 1 & ShowObs == T) {
            result = data.frame(TstX, VoteResultList, OrigTrnG)
        }
        else {
            if (ShowObs == F) {
                result = data.frame(VoteResultList)
            }
            else {
                result = data.frame(TstX, VoteResultList)
            }
        }
    }
    return(result)
}

knnGarden documentation built on May 2, 2019, 11:02 a.m.

Related to knnVCN in knnGarden...