# classDist: Compute and predict the distances to class centroids In caret: Classification and Regression Training

 classDist R Documentation

## Compute and predict the distances to class centroids

### Description

This function computes the class centroids and covariance matrix for a training set for determining Mahalanobis distances of samples to each class centroid.

### Usage

``````classDist(x, ...)

## Default S3 method:
classDist(x, y, groups = 5, pca = FALSE, keep = NULL, ...)

## S3 method for class 'classDist'
predict(object, newdata, trans = log, ...)
``````

### Arguments

 `x` a matrix or data frame of predictor variables `...` optional arguments to pass (not currently used) `y` a numeric or factor vector of class labels `groups` an integer for the number of bins for splitting a numeric outcome `pca` a logical: should principal components analysis be applied to the dataset prior to splitting the data by class? `keep` an integer for the number of PCA components that should by used to predict new samples (`NULL` uses all within a tolerance of `sqrt(.Machine\$double.eps)`) `object` an object of class `classDist` `newdata` a matrix or data frame. If `vars` was previously specified, these columns should be in `newdata` `trans` an optional function that can be applied to each class distance. `trans = NULL` will not apply a function

### Details

For factor outcomes, the data are split into groups for each class and the mean and covariance matrix are calculated. These are then used to compute Mahalanobis distances to the class centers (using `predict.classDist` The function will check for non-singular matrices.

For numeric outcomes, the data are split into roughly equal sized bins based on `groups`. Percentiles are used to split the data.

### Value

for `classDist`, an object of class `classDist` with elements:

 `values ` a list with elements for each class. Each element contains a mean vector for the class centroid and the inverse of the class covariance matrix `classes` a character vector of class labels `pca` the results of `prcomp` when `pca = TRUE` `call` the function call `p` the number of variables `n` a vector of samples sizes per class

For `predict.classDist`, a matrix with columns for each class. The columns names are the names of the class with the prefix `dist.`. In the case of numeric `y`, the class labels are the percentiles. For example, of `groups = 9`, the variable names would be `dist.11.11`, `dist.22.22`, etc.

Max Kuhn

### References

Forina et al. CAIMAN brothers: A family of powerful classification and class modeling techniques. Chemometrics and Intelligent Laboratory Systems (2009) vol. 96 (2) pp. 239-245

`mahalanobis`

### Examples

``````trainSet <- sample(1:150, 100)

distData <- classDist(iris[trainSet, 1:4],
iris\$Species[trainSet])

newDist <- predict(distData,
iris[-trainSet, 1:4])

splom(newDist, groups = iris\$Species[-trainSet])

``````

caret documentation built on March 31, 2023, 9:49 p.m.