classifdist: Classification of unclustered points
In fpc: Flexible Procedures for Clustering

classifdist

R Documentation

Classification of unclustered points

Description

Various methods for classification of unclustered points from clustered points for use within functions nselectboot and prediction.strength.

Usage

classifdist(cdist,clustering,
                      method="averagedist",
                      centroids=NULL,nnk=1)

classifnp(data,clustering,
                      method="centroid",cdist=NULL,
                      centroids=NULL,nnk=1)

Arguments

`cdist`	dissimilarity matrix or `dist`-object. Necessary for `classifdist` but optional for `classifnp` and there only used if `method="averagedist"` (if not provided, `dist` is applied to `data`).
`data`	something that can be coerced into a an `n*p`-data matrix.
`clustering`	integer vector. Gives the cluster number (between 1 and k for k clusters) for clustered points and should be -1 for points to be classified.
`method`	one of `"averagedist", "centroid", "qda", "knn"`. See details.
`centroids`	for `classifnp` a k times p matrix of cluster centroids. For `classifdist` a vector of numbers of centroid objects as provided by `pam`. Only used if `method="centroid"`; in that case mandatory for `classifdist` but optional for `classifnp`, where cluster mean vectors are computed if `centroids=NULL`.
`nnk`	number of nearest neighbours if `method="knn"`.

Details

classifdist is for data given as dissimilarity matrix, classifnp is for data given as n times p data matrix. The following methods are supported:

"centroid": assigns observations to the cluster with closest cluster centroid as specified in argument centroids (this is associated to k-means and pam/clara-clustering).
"qda": only in classifnp. Classifies by quadratic discriminant analysis (this is associated to Gaussian clusters with flexible covariance matrices), calling qda with default settings. If qda gives an error (usually because a class was too small), lda is used.
"lda": only in classifnp. Classifies by linear discriminant analysis (this is associated to Gaussian clusters with equal covariance matrices), calling lda with default settings.
"averagedist": assigns to the cluster to which an observation has the minimum average dissimilarity to all points in the cluster (this is associated with average linkage clustering).
"knn": classifies by nnk nearest neighbours (for nnk=1, this is associated with single linkage clustering). Calls knn in classifnp.
"fn": classifies by the minimum distance to the farthest neighbour. This is associated with complete linkage clustering).

Value

An integer vector giving cluster numbers for all observations; those for the observations already clustered in the input are the same as in the input.

Author(s)

Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en/

Examples

  
set.seed(20000)
x1 <- rnorm(50)
y <- rnorm(100)
x2 <- rnorm(40,mean=20)
x3 <- rnorm(10,mean=25,sd=100)
x <-cbind(c(x1,x2,x3),y)
truec <- c(rep(1,50),rep(2,40),rep(3,10))
topredict <- c(1,2,51,52,91)
clumin <- truec
clumin[topredict] <- -1

classifnp(x,clumin, method="averagedist")
classifnp(x,clumin, method="qda")
classifdist(dist(x),clumin, centroids=c(3,53,93),method="centroid")
classifdist(dist(x),clumin,method="knn")

fpc documentation built on Sept. 24, 2024, 9:07 a.m.