Description Usage Arguments Details Value Note Author(s) References See Also Examples
k-nearest neighbour classification of versatile Distance version for test set from training set. For each row of the test set, the k nearest (in multiple distances) training set vectors are found, and the classification is decided by majority vote. This function allows you measure the distance bewteen vectors by six different means. K Threshold Value Check and Same K_i Problem Dealing are also been considered.
1 | knnVCN(TrnX, OrigTrnG, TstX, K = 1, ShowObs=F,method = "euclidean",p =2)
|
TrnX |
matrix or data frame of training set cases. |
OrigTrnG |
matrix or data frame of true classifications of training set. |
TstX |
matrix or data frame of test set cases. A vector will be interpreted as a row vector for a single case. |
K |
number of top K nearest neighbours considered. |
ShowObs |
logical, when it's ture, the funtion will output the imformation of training set cases. |
method |
the distance measure to be used. This must be one of "euclidean", "maximum", "manhattan","canberra", "binary" or "minkowski". Any unambiguous substring can be given. |
p |
The power of the Minkowski distance. |
K Threshold Value is stipulated to be less than the minimum size of the class in training set, or a warning will be shown.
Sometimes a case may get same "ballot" from class A and class B (even C, D, ...), this time a weighted voting process will be activated. The weight is based on the actual distance calculated between the test case and K cases in neighbor A and B. The test case belongs to the class with less total distance.
The multiple distances are implemented by transfering the function dist(). For the convenience of users, we quote the details of function "dist()" and show them here.
Available distance measures are :
euclidean: Usual square distance between the two vectors (2 norm).
maximum: Maximum distance between two components of x and y (supremum norm)
manhattan: Absolute distance between the two vectors (1 norm).
canberra: sum(abs(Xi-Yi)/abs(Xi+Yi)) Terms with zero numerator and denominator are omitted from the sum and treated as if the values were missing.
This is intended for non-negative values (e.g. counts): taking the absolute value of the denominator is a 1998 R modification to avoid negative distances.
binary: (aka asymmetric binary): The vectors are regarded as binary bits, so non-zero elements are "on" and zero elements are "off". The distance is the proportion of bits in which only one is on amongst those in which at least one is on.
minkowski: The p norm, the pth root of the sum of the pth powers of the differences of the components.
Missing values are allowed, and are excluded from all computations involving the rows within which they occur. Further, when Inf values are involved, all pairs of values are excluded when their contribution to the distance gave NaN or NA. If some columns are excluded in calculating a Euclidean, Manhattan, Canberra or Minkowski distance, the sum is scaled up proportionally to the number of columns used. If all pairs are excluded when calculating a particular distance, the value is NA.
result of classifications of test set will be returned. (When TstX is NULL, the function will automatically consider the user is trying to test the knn algorithm. Hence, a test result table and accuracy report will be shown on the R-console.)
If you want to use the distance measure "binary", the vectors must be binary bits, non-zero elements are "on" and zero elements are "off".
Xinmiao Wang
Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | library(knnGarden)
data(iris)
## Define data
TrnX<-iris[,1:4]
OrigTrnG<-iris[,5]
#
TstX<-iris[c(1:20,50:70,120:140),1:4]
#or
TstX<-NULL
## Call function
knnVCN(TrnX=TrnX,OrigTrnG=OrigTrnG,TstX=TstX,ShowObs=FALSE,K=5,method="euclidean",p = 2)
## The function is currently defined as
function (TrnX, OrigTrnG, TstX = NULL, K = 1, ShowObs = F, method = "euclidean",
p = 2)
{
OrigTrnG = as.factor(OrigTrnG)
TrnG = as.numeric(OrigTrnG)
CodeMeaning = data.frame(TrnG, OrigTrnG)
TK = sort(as.matrix(table(TrnG)), decreasing = F)
if (K > TK[1]) {
stop(c("\nNOTES: \nsorry, the value of K ", "(K=", K,
") ", "you have selected is bigger than the capacity of one class in your training data set",
"(", "the capacity is ", TK[1], ")", ",", "please choose a less value for K"))
}
if (is.null(TstX) == T) {
IsTst = 1
TstX <- as.matrix(TrnX)
}
else {
IsTst = 0
}
if (is.matrix(TstX) == F) {
TstX <- as.matrix(TstX)
}
TrnX <- as.matrix(TrnX)
ElmTrnG = union(TrnG, TrnG)
LevTrnG = length(ElmTrnG)
TrnTotal = cbind(TrnG, TrnX)
NTstX = nrow(TstX)
NTrnTotal = nrow(TrnTotal)
VoteResult = NULL
VoteResultList = NULL
for (i in 1:nrow(TstX)) {
RankBoardI <- NULL
RankBoardIJ <- NULL
Total = rbind(TstX[i, ], TrnX)
RankBoardI = as.matrix(dist(Total, method = method, p = p)[1:nrow(TrnX)])
RankBoardIJ = cbind(TrnG, RankBoardI)
VoteAndWeight = RankBoardIJ[sort(RankBoardIJ[, 2], index.return = T)$ix[1:K],
1:2]
TempVote4TstXI = RankBoardIJ[sort(RankBoardIJ[, 2], index.return = T)$ix[1:K],
1]
ElmVote = union(TempVote4TstXI, TempVote4TstXI)
CountVote = as.matrix(sort(table(TempVote4TstXI), decreasing = T))
TempWinner = as.numeric(rownames(CountVote))
if (length(CountVote) == 1 | K == 1) {
Winner = TempWinner[1]
TstXIBelong = union(CodeMeaning$OrigTrnG[which(CodeMeaning$TrnG ==
Winner)], CodeMeaning$OrigTrnG[which(CodeMeaning$TrnG ==
Winner)])
VoteResultNode = data.frame(TstXIBelong)
VoteResultList = rbind(VoteResultList, VoteResultNode)
}
else {
NumOfTie = CountVote[1]
FinalList = NULL
j = 1
TempWeight = sum(VoteAndWeight[which(VoteAndWeight[,
1] == TempWinner[j]), 2])
FinalList = data.frame(TempWinner[j], TempWeight)
while (CountVote[j] == CountVote[j + 1] & j < length(CountVote)) {
TempWeight = sum(VoteAndWeight[which(VoteAndWeight[,
1] == TempWinner[j + 1]), 2])
FinalListNode = c(TempWinner[j + 1], TempWeight)
FinalList = rbind(FinalList, FinalListNode)
j = j + 1
}
FinalList = FinalList[sort(FinalList$TempWeight,
index.return = T)$ix[1], ]
TstXIBelong = union(CodeMeaning$OrigTrnG[which(CodeMeaning$TrnG ==
FinalList[1, 1])], CodeMeaning$OrigTrnG[which(CodeMeaning$TrnG ==
FinalList[1, 1])])
VoteResultNode = data.frame(TstXIBelong)
VoteResultList = rbind(VoteResultList, VoteResultNode)
}
}
if (IsTst == 1) {
CheckT = as.matrix(table(data.frame(VoteResultList, OrigTrnG)))
AccuStat = 1 - sum(CheckT - diag(diag(CheckT)))/length(TrnG)
print(CheckT)
cat("the classification accuracy of this algorithm on this training dataset is: ",
AccuStat * 100, "%", "\n\n\n")
}
if (IsTst == 1 & ShowObs == F) {
result = data.frame(VoteResultList, OrigTrnG)
}
else {
if (IsTst == 1 & ShowObs == T) {
result = data.frame(TstX, VoteResultList, OrigTrnG)
}
else {
if (ShowObs == F) {
result = data.frame(VoteResultList)
}
else {
result = data.frame(TstX, VoteResultList)
}
}
}
return(result)
}
|
Loading required package: cluster
OrigTrnG
TstXIBelong setosa versicolor virginica
setosa 50 0 0
versicolor 0 47 2
virginica 0 3 48
the classification accuracy of this algorithm on this training dataset is: 96.66667 %
TstXIBelong OrigTrnG
1 setosa setosa
2 setosa setosa
3 setosa setosa
4 setosa setosa
5 setosa setosa
6 setosa setosa
7 setosa setosa
8 setosa setosa
9 setosa setosa
10 setosa setosa
11 setosa setosa
12 setosa setosa
13 setosa setosa
14 setosa setosa
15 setosa setosa
16 setosa setosa
17 setosa setosa
18 setosa setosa
19 setosa setosa
20 setosa setosa
21 setosa setosa
22 setosa setosa
23 setosa setosa
24 setosa setosa
25 setosa setosa
26 setosa setosa
27 setosa setosa
28 setosa setosa
29 setosa setosa
30 setosa setosa
31 setosa setosa
32 setosa setosa
33 setosa setosa
34 setosa setosa
35 setosa setosa
36 setosa setosa
37 setosa setosa
38 setosa setosa
39 setosa setosa
40 setosa setosa
41 setosa setosa
42 setosa setosa
43 setosa setosa
44 setosa setosa
45 setosa setosa
46 setosa setosa
47 setosa setosa
48 setosa setosa
49 setosa setosa
50 setosa setosa
51 versicolor versicolor
52 versicolor versicolor
53 versicolor versicolor
54 versicolor versicolor
55 versicolor versicolor
56 versicolor versicolor
57 versicolor versicolor
58 versicolor versicolor
59 versicolor versicolor
60 versicolor versicolor
61 versicolor versicolor
62 versicolor versicolor
63 versicolor versicolor
64 versicolor versicolor
65 versicolor versicolor
66 versicolor versicolor
67 versicolor versicolor
68 versicolor versicolor
69 versicolor versicolor
70 versicolor versicolor
71 virginica versicolor
72 versicolor versicolor
73 virginica versicolor
74 versicolor versicolor
75 versicolor versicolor
76 versicolor versicolor
77 versicolor versicolor
78 versicolor versicolor
79 versicolor versicolor
80 versicolor versicolor
81 versicolor versicolor
82 versicolor versicolor
83 versicolor versicolor
84 virginica versicolor
85 versicolor versicolor
86 versicolor versicolor
87 versicolor versicolor
88 versicolor versicolor
89 versicolor versicolor
90 versicolor versicolor
91 versicolor versicolor
92 versicolor versicolor
93 versicolor versicolor
94 versicolor versicolor
95 versicolor versicolor
96 versicolor versicolor
97 versicolor versicolor
98 versicolor versicolor
99 versicolor versicolor
100 versicolor versicolor
101 virginica virginica
102 virginica virginica
103 virginica virginica
104 virginica virginica
105 virginica virginica
106 virginica virginica
107 versicolor virginica
108 virginica virginica
109 virginica virginica
110 virginica virginica
111 virginica virginica
112 virginica virginica
113 virginica virginica
114 virginica virginica
115 virginica virginica
116 virginica virginica
117 virginica virginica
118 virginica virginica
119 virginica virginica
120 versicolor virginica
121 virginica virginica
122 virginica virginica
123 virginica virginica
124 virginica virginica
125 virginica virginica
126 virginica virginica
127 virginica virginica
128 virginica virginica
129 virginica virginica
130 virginica virginica
131 virginica virginica
132 virginica virginica
133 virginica virginica
134 virginica virginica
135 virginica virginica
136 virginica virginica
137 virginica virginica
138 virginica virginica
139 virginica virginica
140 virginica virginica
141 virginica virginica
142 virginica virginica
143 virginica virginica
144 virginica virginica
145 virginica virginica
146 virginica virginica
147 virginica virginica
148 virginica virginica
149 virginica virginica
150 virginica virginica
function (TrnX, OrigTrnG, TstX = NULL, K = 1, ShowObs = F, method = "euclidean",
p = 2)
{
OrigTrnG = as.factor(OrigTrnG)
TrnG = as.numeric(OrigTrnG)
CodeMeaning = data.frame(TrnG, OrigTrnG)
TK = sort(as.matrix(table(TrnG)), decreasing = F)
if (K > TK[1]) {
stop(c("\\nNOTES: \\nsorry, the value of K ", "(K=", K,
") ", "you have selected is bigger than the capacity of one class in your training data set",
"(", "the capacity is ", TK[1], ")", ",", "please choose a less value for K"))
}
if (is.null(TstX) == T) {
IsTst = 1
TstX <- as.matrix(TrnX)
}
else {
IsTst = 0
}
if (is.matrix(TstX) == F) {
TstX <- as.matrix(TstX)
}
TrnX <- as.matrix(TrnX)
ElmTrnG = union(TrnG, TrnG)
LevTrnG = length(ElmTrnG)
TrnTotal = cbind(TrnG, TrnX)
NTstX = nrow(TstX)
NTrnTotal = nrow(TrnTotal)
VoteResult = NULL
VoteResultList = NULL
for (i in 1:nrow(TstX)) {
RankBoardI <- NULL
RankBoardIJ <- NULL
Total = rbind(TstX[i, ], TrnX)
RankBoardI = as.matrix(dist(Total, method = method, p = p)[1:nrow(TrnX)])
RankBoardIJ = cbind(TrnG, RankBoardI)
VoteAndWeight = RankBoardIJ[sort(RankBoardIJ[, 2], index.return = T)$ix[1:K],
1:2]
TempVote4TstXI = RankBoardIJ[sort(RankBoardIJ[, 2], index.return = T)$ix[1:K],
1]
ElmVote = union(TempVote4TstXI, TempVote4TstXI)
CountVote = as.matrix(sort(table(TempVote4TstXI), decreasing = T))
TempWinner = as.numeric(rownames(CountVote))
if (length(CountVote) == 1 | K == 1) {
Winner = TempWinner[1]
TstXIBelong = union(CodeMeaning$OrigTrnG[which(CodeMeaning$TrnG ==
Winner)], CodeMeaning$OrigTrnG[which(CodeMeaning$TrnG ==
Winner)])
VoteResultNode = data.frame(TstXIBelong)
VoteResultList = rbind(VoteResultList, VoteResultNode)
}
else {
NumOfTie = CountVote[1]
FinalList = NULL
j = 1
TempWeight = sum(VoteAndWeight[which(VoteAndWeight[,
1] == TempWinner[j]), 2])
FinalList = data.frame(TempWinner[j], TempWeight)
while (CountVote[j] == CountVote[j + 1] & j < length(CountVote)) {
TempWeight = sum(VoteAndWeight[which(VoteAndWeight[,
1] == TempWinner[j + 1]), 2])
FinalListNode = c(TempWinner[j + 1], TempWeight)
FinalList = rbind(FinalList, FinalListNode)
j = j + 1
}
FinalList = FinalList[sort(FinalList$TempWeight,
index.return = T)$ix[1], ]
TstXIBelong = union(CodeMeaning$OrigTrnG[which(CodeMeaning$TrnG ==
FinalList[1, 1])], CodeMeaning$OrigTrnG[which(CodeMeaning$TrnG ==
FinalList[1, 1])])
VoteResultNode = data.frame(TstXIBelong)
VoteResultList = rbind(VoteResultList, VoteResultNode)
}
}
if (IsTst == 1) {
CheckT = as.matrix(table(data.frame(VoteResultList, OrigTrnG)))
AccuStat = 1 - sum(CheckT - diag(diag(CheckT)))/length(TrnG)
print(CheckT)
cat("the classification accuracy of this algorithm on this training dataset is: ",
AccuStat * 100, "%", "\\n\\n\\n")
}
if (IsTst == 1 & ShowObs == F) {
result = data.frame(VoteResultList, OrigTrnG)
}
else {
if (IsTst == 1 & ShowObs == T) {
result = data.frame(TstX, VoteResultList, OrigTrnG)
}
else {
if (ShowObs == F) {
result = data.frame(VoteResultList)
}
else {
result = data.frame(TstX, VoteResultList)
}
}
}
return(result)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.