pdfClassification | R Documentation |

Allocates low density data points in a multi-stage procedure after that cluster cores have been detected
by applying `pdfCluster`

.

pdfClassification(obj, n.stage = 5, se = TRUE, hcores = FALSE)

`obj` |
An object of |

`n.stage` |
Allocation of low density data is performed by following a multi-stages procedure in |

`se` |
Logical. Should the standard-error of the density estimates be taken into account to define the order of allocation? Default value is TRUE. See details below. |

`hcores` |
Logical. Set this value to TRUE to build cluster density estimates by selecting the same bandwidths as the ones used to form the cluster cores. Otherwise, bandwidths specific for the clusters are selected. Default value is FALSE. See details below. |

The basic idea of the classification stage of the procedure is as follows: for an unallocated data point *x_0*,
compute the estimated density *\hat{f}_m(x_0)* based on the data already assigned to group *m, m = 1, 2, …, M*,
and assign *x_0* to the group with highest log ratio *\hat{f}_m(x_0)/\max_m \hat{f}_m(x_0)*.

In case *\hat{f}_m(x_0)*=0, for all *m = 1, 2, …, M*, *x_0* is considered as an outlier. The procedure gives a warning
message and the outlier remains unclassified. The cluster label of *x_0* will be set to zero.

The current implementation of this idea proceeds in `n.stage`

stages, allocating a block of points at a time,
updating the estimates *\hat{f}_m(\cdot)* based on the new members of each group and then allocating a new block of points.
When `se = TRUE`

, classification is performed by further weighting the log-ratios inversely with their approximated standard
error, so that points whose density estimate has highest precision are allocated first.

Each of the *\hat{f}_m(\cdot)* is built by selecting either the same bandwidths *h_0* as the ones used to form the cluster cores (when `hcores = TRUE`

) or cluster-specific bandwidths, obtained as follows:

*h_m^{*} = \exp [(1-a_m) \log(h_0) + a_m \log(h_m)],*

where *a_m* is the proportion of data points in the *m*-th cluster core and *h_m* are asymptotically optimal for a normal distribution of the *m*-th cluster or computed according to the Silverman (1986) approach, if the kernel estimator has fixed or adaptive bandwidth, respectively.

An object of `pdfCluster-class`

with slot `stages`

of class `"list"`

having length equal to `n.stage`

.
See `pdfCluster-class`

for further details.

Function `pdfClassification`

is called internally, from `pdfCluster`

, when the argument
`n.stage`

is set to a value greater than zero. Alternatively, it may be called externally, by providing as
argument an object of `pdfCluster-class`

.

When `pdfClassification`

is internally called from `pdfCluster`

and one group only is detected,
the slot `stages`

is a list with `n.stage`

elements, each of them being a vector with length equal to the number
of data points and all elements equal to 1.

Azzalini A., Torelli N. (2007). Clustering via nonparametric density estimation. *Statistics and Computing*. 17, 71-80.

Silverman, B. (1986). *Density estimation for statistics and data analysis*. Chapman and Hall, London.

`pdfCluster`

, `pdfCluster-class`

# load data data(wine) # select a subset of variables x <- wine[, c(2,5,8)] #whole procedure, included the classification phase cl <- pdfCluster(x) summary(cl) table(groups(cl)) #use of bandwidths specific for the group cl1 <- pdfClassification(cl, hcores= TRUE) table(groups(cl1))

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.