Probabilistic Distance Clustering

Description

Probabilistic distance clustering (PD-clustering) is an iterative, distribution free, probabilistic clustering method. PD clustering assigns units to a cluster according to their probability of membership, under the constraint that the product of the probability and the distance of each point to any cluster centre is a constant.

Usage

1
PDclust(data = NULL, k = 2)

Arguments

data

A matrix or data frame such that rows correspond to observations and columns correspond to variables.

k

A numerical parameter giving the number of clusters

Value

A list with components

label

A vector of integers indicating the cluster membership for each unit

centers

A matrix of cluster centers

probability

A matrix of probability of each point belonging to each cluster

JDF

The value of the Joint distance function

iter

The number of iterations

Author(s)

Cristina Tortora and Paul D. McNicholas

References

Ben-Israel A, Iyigun. Probabilistic D-Clustering. Journal of Classification, 25(1), 5–26, 2008.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#Normally generated clusters
c1 = c(+2,+2,2,2)
c2 = c(-2,-2,-2,-2)
c3 = c(-3,3,-3,3)
n=200
x1 = cbind(rnorm(n, c1[1]), rnorm(n, c1[2]), rnorm(n, c1[3]), rnorm(n, c1[4]) )
x2 = cbind(rnorm(n, c2[1]), rnorm(n, c2[2]),rnorm(n, c2[3]), rnorm(n, c2[4]) )
x3 = cbind(rnorm(n, c3[1]), rnorm(n, c3[2]),rnorm(n, c3[3]), rnorm(n, c3[4]) )
x = rbind(x1,x2,x3)
pdn=PDclust(x,3)
plot(x[,1:2],col=pdn$label)
plot(x[,3:4],col=pdn$label)