supc1: Self-Updating Process Clustering

Description Usage Arguments Details Value References Examples

View source: R/supc1.R

Description

The SUP is a distance-based method for clustering. The idea of this algorithm is similar to gravitational attraction: every sample gravitates towards one another. The algorithm mimics the process of gravitational attraction iteratively that eventually merges the samples into clusters on the sample space. During the iterations, all samples continue moving until the system becomes stable.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
supc1(
  x,
  r = NULL,
  rp = NULL,
  t = c("static", "dynamic"),
  tolerance = 1e-04,
  cluster.tolerance = 10 * tolerance,
  drop = TRUE,
  implementation = c("cpp", "R", "cpp2"),
  sort = TRUE,
  verbose = (nrow(x) > 10000)
)

Arguments

x

data matrix. Each row is an instance of the data.

r

numeric vector or NULL. The parameter r of the self-updating process.

rp

numeric vector or NULL. If r is NULL, then rp will be used. The corresponding r is the rp-percentile of the pairwise distances of the data. If both r and rp are NULL, then the default value is rp = c(0.0005, 0.001, 0.01, 0.1, 0.3).

t

either numeric vector, list of function, or one of "static" or "dynamic". The parameter T(t) of the self-updating process.

tolerance

numeric value. The threshold of convergence.

cluster.tolerance

numeric value. After iterations, if the distance of two points are smaller than cluster.tolerance, then they are identified as in the same cluster.

drop

logical value. Whether to delete the list structure if its length is 1.

implementation

eithor "R", "cpp" or "cpp2". Choose the engine to calculate result. The "cpp2" parallelly computes the distance in C++ with OpenMP, which is not supported under OS X, and uses the early-stop to speed up calculation.

sort

logical value. Whether to sort the cluster id by size.

verbose

logical value. Whether to show the iteration history.

Details

Please check the vignettes via vignette("supc", package = "supc") for details.

Value

supc1 returns a list of objects of class "supc".

Each "supc" object contains the following elements:

x

The input matrix.

d0

The pairwise distance matrix of x or NULL.

r

The value of r of the clustering.

t

The function T(t) of the clustering.

cluster

The cluster id of each instance.

centers

The center of each cluster.

size

The size of each cluster.

iteration

The number of iterations before convergence.

result

The position of data after iterations.

References

Shiu, Shang-Ying, and Ting-Li Chen. 2016. "On the Strengths of the Self-Updating Process Clustering Algorithm." Journal of Statistical Computation and Simulation 86 (5): 1010–1031. doi: 10.1080/00949655.2015.1049605.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
set.seed(1)
X <- local({
 mu <- list(
   x = c(0, 2, 1, 6, 8, 7, 3, 5, 4),
   y = c(0, 0, 1, 0, 0, 1, 3, 3, 4)
 )
 X <- lapply(1:5, function(i) {
   cbind(rnorm(9, mu$x, 1/5), rnorm(9, mu$y, 1/5))
 })
 X <- do.call(rbind, X)
 n <- nrow(X)
 X <- rbind(X, matrix(0, 20, 2))
 k <- 1
 while(k <= 20) {
   tmp <- c(13*runif(1)-2.5, 8*runif(1)-2.5)
   y1 <- mu$x - tmp[1]
   y2 <- mu$y - tmp[2]
   y <- sqrt(y1^2+y2^2)
   if (min(y)> 2){
     X[k+n,] <- tmp
     k <- k+1
   }
 }
 X
})
X.supcs <- supc1(X, r = c(0.9, 1.7, 2.5), t = "dynamic", implementation = "R")
X.supcs$cluster
plot(X.supcs[[1]], type = "heatmap", major.size = 2)
plot(X.supcs[[2]], type = "heatmap", col = cm.colors(24), major.size = 5)

X.supcs <- supc1(X, r = c(1.7, 2.5), t = list(
 function(t) {1.7 / 20 + exp(t) * (1.7 / 50)},
 function(t) {exp(t)}
), implementation = "R")
plot(X.supcs[[1]], type = "heatmap", major.size = 2)
plot(X.supcs[[2]], type = "heatmap", col = cm.colors(24), major.size = 5)

supc documentation built on Dec. 11, 2021, 5:07 p.m.