P3C: The P3C Algorithm for Projected Clustering

Description Usage Arguments References See Also Examples


The main idea of the P3C algorithm is to use statistical distributions for the task of finding clusters. To this end each dimension is first split into 1+log_2(nrow(data)) bins and the chi-square test is used to compute the probability that the sizes of these bins are uniformly distributed. If this probability is bigger than 1-ChiSquareAlpha, nothing happens. Otherwise the largest bins will be removed until this is the case. The bins that were removed in this way are then used to find clusters. To this end, bins that are adjacent are merged. Then clusters are formed by taking a bin from one dimension and determining the probability of sharing as many points as it does with each bin from another dimension. The bin is then intersected with the bin from another dimension where this probability is the lowest, given that it is at least lower than 1.00E-PoissonThreshold and this is repeated until no such bin is found.


P3C(data, ChiSquareAlpha = 0.005, PoissonThreshold = 19)



A Matrix of input data.


probability of not being uniformly distributed that the points in a dimension are allowed to have without assuming that there is a cluster visible from this dimension


maximum probability for a bin in another dimension to deviate from the observed bin as much as it does that is allowed. The value used for this will be 1.00*10^-PoissonThreshold.


Gabriela Moise, Jörg Sander and Martin Ester P3C: A Robust Projected Clustering Algorithm In Proc. 6th IEEE International Conference on Data Mining 2006

See Also

Other subspace.clustering.algorithms: CLIQUE; FIRES; ProClus; SubClu



Search within the subspace package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.