supc.random: Randomized Self-Updating Process Clustering

Description Usage Arguments Details Value References Examples

View source: R/supc1.R

Description

The Randomized Self-Updating Process Clustering (randomized SUP) is a modification of the original SUP algorithm. The randomized SUP randomly generates the partition of the instances during each iterations. At each iteration, the self updating process is conducted independently in each partition in order to reduce the computation and the memory.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
supc.random(
  x,
  r = NULL,
  rp = NULL,
  t = c("static", "dynamic"),
  k = NULL,
  groups = NULL,
  tolerance = 1e-04,
  cluster.tolerance = 10 * tolerance,
  drop = TRUE,
  implementation = c("cpp", "R"),
  sort = TRUE,
  verbose = (nrow(x) > 10000)
)

Arguments

x

data matrix. Each row is an instance of the data.

r

numeric vector or NULL. The parameter r of the self-updating process.

rp

numeric vector or NULL. If r is NULL, then rp will be used. The corresponding r is the rp-percentile of the pairwise distances of the data. If both r and rp are NULL, then the default value is rp = c(0.0005, 0.001, 0.01, 0.1, 0.3).

t

either numeric vector, list of function, or one of "static" or "dynamic". The parameter T(t) of the self-updating process.

k

integer value. The number of the partitions.

groups

list. The first element is the partition of the first iteration, and the second element is the partition of the second iteration, etc. If the number of the iteration exceeds length(groups), then new partition will be generated.

tolerance

numeric value. The threshold of convergence.

cluster.tolerance

numeric value. After iterations, if the distance of two points are smaller than cluster.tolerance, then they are identified as in the same cluster.

drop

logical value. Whether to delete the list structure if its length is 1.

implementation

eithor "R" or "cpp". Choose the engine to calculate result.

sort

logical value. Whether to sort the cluster id by size.

verbose

logical value. Whether to show the iteration history.

Details

Please check the vignettes via vignette("supc", package = "supc") for details.

Value

supc1 returns a list of objects of class "supc".

Each "supc" object contains the following elements:

x

The input matrix.

d0

The pairwise distance matrix of x.

r

The value of r of the clustering.

t

The function T(t) of the clustering.

cluster

The cluster id of each instance.

centers

The center of each cluster.

size

The size of each cluster.

iteration

The number of iterations before convergence.

groups

The partition of each iteration.

result

The position of data after iterations.

References

Shiu, Shang-Ying, and Ting-Li Chen. 2016. "On the Strengths of the Self-Updating Process Clustering Algorithm." Journal of Statistical Computation and Simulation 86 (5): 1010–1031. doi: 10.1080/00949655.2015.1049605.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
## Not run: 
# The shape data has a structure of five clusters and a number of noise data points.
makecircle=function(N, seed){
 n=0
 x=matrix(NA, nrow=N, ncol=2)
 while (n<N){
   tmp=runif(2, min=0, max=1)*2-1
   if (sum(tmp^2)<1) {
      n=n+1
      x[n,]=tmp
   }
 }
 return(x)
}

makedata <- function(ns, seed) {
 size=c(10,3,3,1,1)
 mu=rbind(c(-0.3, -0.3), c(-0.55, 0.8), c(0.55, 0.8), c(0.9, 0), c(0.9, -0.6))
 sd=rbind(c(0.7, 0.7), c(0.45, 0.2), c(0.45, 0.2), c(0.1, 0.1), c(0.1, 0.1))
 x=NULL

 for (i in 1:5){
    tmp=makecircle(ns*size[i], seed+i)
    tmp[,1]=tmp[,1]*sd[i,1]+mu[i,1]
    tmp[,2]=tmp[,2]*sd[i,2]+mu[i,2]
    x=rbind(x, tmp)
 }
 
 tmp=runif(floor(ns/3), min=0, max=1)/5-0.1
 tmp=cbind(tmp, 0.8*rep(1, floor(ns/3)))
 x=rbind(x, tmp)
 x=rbind(x, matrix(1, nrow=2*ns, ncol=2)*2-1)
 return(x)
}

shape1 <- makedata(5000, 1000)
dim(shape1)
plot(shape1)

X.supc=supc.random(shape1, r=0.5, t="dynamic", k = 500)
plot(shape1, col=X.supc$cluster)

## End(Not run)

wush978/supc documentation built on Oct. 12, 2021, 3:24 p.m.