CirClust: Circular Data Clustering

Description Usage Arguments Details Value References Examples

View source: R/CirClust.R

Description

Perform clustering on circular data to minimize the within-cluster sum of squared distances.

Usage

1
CirClust(O, K, Circumference, method = c("FOCC", "HEUC", "BOCC"))

Arguments

O

a vector of circular data points. They can be coordinates along the circle based on distance, or angles around the circle.

K

the number of clusters

Circumference

the circumference of the circle where data are located

method

the circular clustering method. "FOCC": fast and optimal, the default method; "HEUC": based on heuristic k-means, fast but not necessarily optimal; "BOCC": brute-force based on Ckmeans.1d.dp, slow but optimal, included to provide a baseline.

Details

By circular data, we broadly refer to data points on any non-self-intersecting loop. In clustering N circular points into K clusters, the "FOCC" algorithm is reproducible with runtime O(K N log^2 N) \insertCiteDebnath21OptCirClust; The "HEUC" algorithm, not always reproducible, calls the kmeans function repeatedly; The "BOCC" algorithm with runtime O(KN^2), reproducible but slow, is done via repeatedly calling the Ckmeans.1d.dp function.

Value

An object of class "CirClust" which has a plot method. It is a list with the following components:

cluster

a vector of clusters assigned to each element in O. Each cluster is indexed by an integer from 1 to K.

centers

a numeric vector of the means for each cluster in the circular data.

withinss

a numeric vector of the within-cluster sum of squares for each cluster.

size

a vector of the number of elements in each cluster.

totss

the total sum of squared distances between each element and the sample mean. This statistic is not dependent on the clustering result.

tot.withinss

the total sum of within-cluster squared distances between each element and its cluster mean. This statistic is minimized given the number of clusters.

betweenss

the sum of squared distances between each cluster mean and sample mean. This statistic is maximized given the number of clusters.

ID

the starting index of the frame with minimum SSQ

Border

the borders of K clusters

Border.mid

the middle point of the last and first points of two consequitive clusters.

O_name

a character string. The actual name of the O argument.

Circumference

the circumfarence of the circular or periodic data.

References

\insertAllCited

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
O <- c(1,2, 10,11,12,13,14,15, 27,28,29,30,31,32, 40,41)

K <- 3

Circumference <- 42

# Perform circular clustering:
output <- CirClust(O, K, Circumference)

# Visualize the circular clusters:
opar <- par(mar=c(1,1,2,1))
plot(output)
par(opar)

OptCirClust documentation built on July 28, 2021, 9:06 a.m.