The main idea of the P3C algorithm is to use statistical distributions for the
task of finding clusters. To this end each dimension is first split into
1+log_2(nrow(data)) bins and the chi-square test is used to compute the
probability that the sizes of these bins are uniformly distributed. If this
probability is bigger than 1-*ChiSquareAlpha*, nothing happens. Otherwise
the largest bins will be removed until this is the case. The bins that were
removed in this way are then used to find clusters. To this end, bins that are
adjacent are merged. Then clusters are formed by taking a bin from one
dimension and determining the probability of sharing as many points as it does
with each bin from another dimension. The bin is then intersected with the bin
from another dimension where this probability is the lowest, given that it is
at least lower than 1.00E-*PoissonThreshold* and this is repeated until
no such bin is found.

1 |

`data` |
A Matrix of input data. |

`ChiSquareAlpha` |
probability of not being uniformly distributed that the points in a dimension are allowed to have without assuming that there is a cluster visible from this dimension |

`PoissonThreshold` |
maximum probability for a bin in another dimension to deviate from the observed bin as much as it does that is allowed. The value used for this will be 1.00*10^-PoissonThreshold. |

Gabriela Moise, Jörg Sander and Martin Ester *P3C: A Robust
Projected Clustering Algorithm* In Proc. 6th IEEE International Conference
on Data Mining 2006

Other subspace.clustering.algorithms: `CLIQUE`

;
`FIRES`

; `ProClus`

;
`SubClu`

1 2 | ```
data("subspace_dataset")
P3C(subspace_dataset,PoissonThreshold=3)
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.