This artificial data was generated to have five clusters: one big circle, two small circles, and two ellipses. It was to test if the clustering algorithm could identify and distinguish between the five different clusters or not. The dataset is generated from the following script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | makecircle <- function(N, seed) {
n <- 0
x <- NULL
set.seed(seed)
while(n < N) {
tmp <- runif(2, min = -1, max = 1)
if (t(tmp) %*% tmp < 1) {
n <- n + 1
x <- rbind(x, tmp)
}
}
return (x)
}
makedata <- function(n, seed) {
f <- c(10, 3, 3, 1, 1)
center <- matrix(
c(-.3, -.3, -.55, .8, .55, .8, .9, 0, .9, -.6),
nrow = 5, ncol = 2, byrow = TRUE
)
s <- matrix(
c(.7, .7, .45, .2, .45, .2, .1, .1, .1, .1),
nrow = 5, ncol = 2, byrow = TRUE
)
x <- NULL
for (i in 1:5) {
tmp <- makecircle(n * f[i], seed + i)
tmp[,1] <- tmp[,1] * s[i,1] + center[i,1]
tmp[,2] <- tmp[,2] * s[i,2] + center[i,2]
x <- rbind(x, tmp)
}
line <- cbind(runif(floor(n / 3), min = -.1, max = .1), rep(.8, floor(n / 3)))
noise <- matrix(runif(8 * n, min = -1, max = 1), nrow = 4 * n, ncol = 2)
return(rbind(x, line, noise))
}
shape <- makedata(50, 1000)
|
Guha, S., R. Rastogi, and K. Shim. 2001. Cure: An Efficient Clustering Algorithm for Large Databases. Information Systems 26 (1): 35–38.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.