supc.random: Randomized Self-Updating Process Clustering
In supc: The Self-Updating Process Clustering Algorithms

Description Usage Arguments Details Value References Examples

The Randomized Self-Updating Process Clustering (randomized SUP) is a modification of the original SUP algorithm. The randomized SUP randomly generates the partition of the instances during each iterations. At each iteration, the self updating process is conducted independently in each partition in order to reduce the computation and the memory.

supc.random(
  x,
  r = NULL,
  rp = NULL,
  t = c("static", "dynamic"),
  k = NULL,
  groups = NULL,
  tolerance = 1e-04,
  cluster.tolerance = 10 * tolerance,
  drop = TRUE,
  implementation = c("cpp", "R"),
  sort = TRUE,
  verbose = (nrow(x) > 10000)
)

`x`	data matrix. Each row is an instance of the data.
`r`	numeric vector or `NULL`. The parameter r of the self-updating process.
`rp`	numeric vector or `NULL`. If `r` is `NULL`, then `rp` will be used. The corresponding `r` is the `rp`-percentile of the pairwise distances of the data. If both `r` and `rp` are `NULL`, then the default value is `rp = c(0.0005, 0.001, 0.01, 0.1, 0.3)`.
`t`	either numeric vector, list of function, or one of `"static" or "dynamic"`. The parameter T(t) of the self-updating process.
`k`	integer value. The number of the partitions.
`groups`	list. The first element is the partition of the first iteration, and the second element is the partition of the second iteration, etc. If the number of the iteration exceeds `length(groups)`, then new partition will be generated.
`tolerance`	numeric value. The threshold of convergence.
`cluster.tolerance`	numeric value. After iterations, if the distance of two points are smaller than `cluster.tolerance`, then they are identified as in the same cluster.
`drop`	logical value. Whether to delete the list structure if its length is 1.
`implementation`	eithor `"R"` or `"cpp"`. Choose the engine to calculate result.
`sort`	logical value. Whether to sort the cluster id by size.
`verbose`	logical value. Whether to show the iteration history.

Please check the vignettes via vignette("supc", package = "supc") for details.

supc1 returns a list of objects of class "supc".

Each "supc" object contains the following elements:

`x`	The input matrix.
`d0`	The pairwise distance matrix of `x`.
`r`	The value of r of the clustering.
`t`	The function T(t) of the clustering.
`cluster`	The cluster id of each instance.
`centers`	The center of each cluster.
`size`	The size of each cluster.
`iteration`	The number of iterations before convergence.
`groups`	The partition of each iteration.
`result`	The position of data after iterations.

Shiu, Shang-Ying, and Ting-Li Chen. 2016. "On the Strengths of the Self-Updating Process Clustering Algorithm." Journal of Statistical Computation and Simulation 86 (5): 1010–1031. doi: 10.1080/00949655.2015.1049605.

# The shape data has a structure of five clusters and a number of noise data points.

makecircle=function(N, seed){
 n=0
 x=matrix(NA, nrow=N, ncol=2)
 while (n<N){
   tmp=runif(2, min=0, max=1)*2-1
   if (sum(tmp^2)<1) {
      n=n+1
      x[n,]=tmp
   }
 }
 return(x)
}

makedata <- function(ns, seed) {
 size=c(10,3,3,1,1)
 mu=rbind(c(-0.3, -0.3), c(-0.55, 0.8), c(0.55, 0.8), c(0.9, 0), c(0.9, -0.6))
 sd=rbind(c(0.7, 0.7), c(0.45, 0.2), c(0.45, 0.2), c(0.1, 0.1), c(0.1, 0.1))
 x=NULL

 for (i in 1:5){
    tmp=makecircle(ns*size[i], seed+i)
    tmp[,1]=tmp[,1]*sd[i,1]+mu[i,1]
    tmp[,2]=tmp[,2]*sd[i,2]+mu[i,2]
    x=rbind(x, tmp)
 }
 
 tmp=runif(floor(ns/3), min=0, max=1)/5-0.1
 tmp=cbind(tmp, 0.8*rep(1, floor(ns/3)))
 x=rbind(x, tmp)
 x=rbind(x, matrix(1, nrow=2*ns, ncol=2)*2-1)
 return(x)
}

shape1 <- makedata(250, 100)
dim(shape1)
plot(shape1)

X.supc=supc.random(shape1, r=0.5, t="dynamic", k = 500, implementation = "R")
plot(shape1, col=X.supc$cluster)