Description Usage Arguments Details Value Author(s) References See Also Examples

Function to perform subspace clustering where the clusters are concentrated in different cluster specific subspaces of the data.

1 2 3 |

`x` |
A matrix or data frame containing the explanatory variables. The method is restricted to numerical data. |

`k` |
Prespecifies the final number of clusters. |

`l` |
Prespecifies the dimension of the final cluster-specific subspaces (equal for all clusters). |

`k0` |
Initial number of clusters (that are computed in the entire data space). Must be greater than |

`a` |
Prespecified factor for the cluster number reduction in each iteration step of the algorithm. |

`inner.loops` |
Number of repetitive iterations (i.e. recomputation of clustering and cluster-specific subspaces) while the number of clusters and the subspace dimension are kept constant. |

`verbose` |
Logical indicating whether the iteration process sould be displayed. |

`...` |
Currently not used. |

The function performs ORCLUS subspace clustering (Aggarwal and Yu, 2000).
Simultaneously both cluster assignments as well as cluster specific subspaces are computed.
Cluster assignments have minimal euclidean distance from the cluster centers in the corresponding subspaces.
As an extension to the originally proposed algorithm initialization in the full data space is done by calling `kmeans`

for `k0`

clusters. Further, by `inner.loops`

a number of repetitions during the iteration process
for each number of clusters and subspace dimension can be specified. An outlier option has not been implemented.
Even though increasing the initialzation parameter `k0`

most strongly effects the computation time
it should be chosen as large as possible (at least several times greater then `k`

).

Returns an object of class `orclus`

. Its structure is similar to objects resulting from calling `kmeans`

.

`cluster` |
Returns the final cluster labels. |

`centers` |
A matrix where each row corresponds to a cluster center (in the original space). |

`size` |
The final number of observations in each cluster. |

`subspaces` |
List of matrices for projection of the data onto the cluster-specific supspaces by post-multiplication. |

`subspace.dimension` |
Dimension of the final subspaces. |

`within.projenss` |
Corresponds to |

`sparsity.coefficient` |
Sparsity coefficient of the clustering result. If its value is close to 1 the subspace dimension may have been chosen too large. A small value close to 0 can be interpreted as a hint that a strong cluster structure has been found. |

`orclus.call` |
(Matched) function call. |

Gero Szepannek

Aggarwal, C. and Yu, P. (2000): *Finding generalized projected clusters in high dimensional spaces*,
Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 70-81.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | ```
# generate simple artificial example of two clusters
clus1.v1 <- runif(100)
clus2.v1 <- runif(100)
xample <- rbind(cbind(clus1.v1, 0.5 - clus1.v1), cbind(clus2.v1, -0.5 + clus2.v1))
plot(xample, col=rep(1:2, each=100))
# try standard kmeans clustering
kmeans.res <- kmeans(xample, 2)
plot(xample, col = kmeans.res$cluster)
# use orclus instead
orclus.res <- orclus(x = xample, k = 2, l = 1, k0 = 8, a = 0.5)
plot(xample, col = orclus.res$cluster)
# show data in cluster-specific subspaces
par(mfrow=c(1,2))
for(i in 1:length(orclus.res$size)) plot(xample %*% orclus.res$subspaces[[i]],
col = orclus.res$cluster, ylab = paste("Identified subspace for cluster",i))
### second 'more multivariate' example to play with...
# definition of a function for parameterized data simulation
sim.orclus <- function(k = 3, nk = 100, d = 10, l = 4,
sd.cl = 0.05, sd.rest = 1, locshift = 1){
### input parameters for data generation
# k number of clusters
# nk observations per cluster
# d original dimension of the data
# l subspace dimension where the clusters are concentrated
# sd.cl (within cluster subspace) standard deviations for data generation
# sd.rest standard deviations in the remaining space
# locshift parameter of a uniform distribution to sample different cluster means
x <- NULL
for(i in 1:k){
# cluster centers
apts <- locshift*matrix(runif(l*k), ncol = l)
# sample points in original space
xi.original <- cbind(matrix(rnorm(nk * l, sd = sd.cl), ncol=l) + matrix(rep(apts[i,], nk),
ncol = l, byrow = TRUE),
matrix(rnorm(nk * (d-l), sd = sd.rest), ncol = (d-l)))
# subspace generation
sym.mat <- matrix(nrow=d, ncol=d)
for(m in 1:d){
for(n in 1:m){
sym.mat[m,n] <- sym.mat[n,m] <- runif(1)
}
}
subspace <- eigen(sym.mat)$vectors
# transformation
xi.transformed <- xi.original %*% subspace
x <- rbind(x, xi.transformed)
}
clids <- rep(1:k, each = nk)
result <- list(x = x, cluster = clids)
return(result)
}
# simulate data, you can play with different parameterizations...
simdata <- sim.orclus(k = 3, nk = 200, d = 15, l = 4,
sd.cl = 0.05, sd.rest = 1, locshift = 1)
# apply kmeans and orclus
kmeans.res2 <- kmeans(simdata$x, 3)
orclus.res2 <- orclus(x = simdata$x, k = 3, l = 4, k0 = 15, a = 0.75)
cat("SC: ", orclus.res2$sparsity.coefficient, "\n")
# compare results
table(kmeans.res2$cluster, simdata$cluster)
table(orclus.res2$cluster, simdata$cluster)
``` |

```
iteration : 1
Initialization with 8 clusters
Actual Subspace dimension : 2
New number of clusters : 4
iteration : 2
Actual Subspace dimension : 1
New number of clusters : 2
Final reassigment...
iteration : 1
Initialization with 15 clusters
Actual Subspace dimension : 15
New number of clusters : 11
iteration : 2
Actual Subspace dimension : 12
New number of clusters : 8
iteration : 3
Actual Subspace dimension : 9
New number of clusters : 6
iteration : 4
Actual Subspace dimension : 7
New number of clusters : 5
iteration : 5
Actual Subspace dimension : 6
New number of clusters : 4
iteration : 6
Actual Subspace dimension : 5
New number of clusters : 3
iteration : 7
Actual Subspace dimension : 4
New number of clusters : 3
Final reassigment...
SC: 0.007938949
1 2 3
1 41 38 176
2 132 36 3
3 27 126 21
1 2 3
1 0 200 0
2 200 0 0
3 0 0 200
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.