separationkNN | R Documentation |
This function takes a square matrix dmx
that contains item by item
distances, and a factor classes
(with as many items as there are rows, and thus columns,
in dmx
) that assigns a class to each item.
The function returns a measure q of how well the distances
in dmx
'capture' the classification in classes
,
where distances are taken to 'capture a classification' to the extent
that items are (immediately) surrounded by other items from the same class,
and not by items from some other class.
Next to an overall cluster quality for all the data taken together,
the function also returns the cluster quality of individual points
and the cluster quality of individual classes
(as well as the mean cluster quality over classes).
All these measures are called q in the output of the function.
separationkNN(dmx, classes, k = NULL, weights = c("linear", "s-curve", "none"))
dmx |
A square matrix containing item by item distances |
classes |
A factor of the same length as the number of rows and columns
in |
k |
The value of |
weights |
The |
The q measures are calculated as follows: first, for each item an item-specific cluster quality is calculated. This is done by calculating the proportion of 'same class items' among its k nearest neighbours. The higher the measure, the better the cluster quality for that item. However, what is calculated is not simply the proportion, but rather the weighted mean of the values of the k nearest neighbours, where a 'same class neighbour' has value one, a 'different class neighbour' has value zero, and the weights of the neighbours can have different settings (see below). In the default settings, weights decrease linearly with their rank of 'distance from the item', and all weights add up to one. For instance, if k is one then the weight is 1. If k is 2, then the weights, starting from the closest nearest neighbour, are .67 and .33. If k is 3, then the weights are .5, .33, and .17. If k is 4, they are .4, .3, .2, and .1. Etc.
The overall cluster quality of the data is then calculated as the
mean cluster quality of all items. Additionally, the cluster quality
for every class in classes
is calculated as the mean cluster
quality of the items belonging to that class. The mean class quality,
finally, is the mean of all class-specific class quality measures.
An object of the class clustqualkNN
,
which is a list containing at least the following components:
globqual |
The global cluster quality q |
meanclassqual |
The mean of all class-specific cluster quality values q |
classqual |
A table with for each class its class-specific clusters quality q |
pointqual |
A numeric vector with for each item its item-specific cluster quality q |
weights |
A numeric vector with the weights that were used |
k |
A number indication which |
# we create a 'point cloud', with points belonging to two classes points <- rbind(matrix(rnorm(100, 2, 2), ncol=2), matrix(rnorm(100, 4, 2), ncol=2)) dst <- dist(points, diag=TRUE, upper=TRUE) classes <- as.factor(rep(c("a","b"), c(50, 50))) # we analyse the cluster quality, letting the procedure choose k fitkNN <- separationkNN(dst, classes) summary(fitkNN) fitkNN$globqual # global cluster quality fitkNN$meanclassqual # mean class quality fitkNN$classqual # class-specific quality # we analyse the cluster quality, setting k to 25 fitkNN <- separationkNN(dst, classes, k=25) summary(fitkNN)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.