View source: R/NominalDistances.R
NominalDistances | R Documentation |
This function computes several measures of distance (or similarity) among individuals from a nominal data matrix.
NominalDistances(X, method = 1, diag = FALSE, upper = FALSE, similarity = TRUE)
X |
Matrix or data.frame with the nominal variables. |
method |
An integer between 1 and 6. See details |
diag |
A logical value indicating whether the diagonal of the distance matrix should be printed. |
upper |
a logical value indicating whether the upper triangle of the distance matrix should be printed. |
similarity |
A logical value indicating whether the similarity matrix should be computed. |
Let be the table of nominal data. All these distances are of type d=\sqrt{1-s}
with s a similarity coefficient.
The overlap measure simply counts the number of attributes that match in the two data instances.
Eskin et al. proposed a normalization kernel for record-based network intrusion detection data. The original measure is distance-based and assigns a weight of \frac{2}{n_{k}^{2}}
for mismatches; when adapted to similarity, this becomes a weight of \frac{n_{k}^{2}}{n_{k}^{2}+2}
.This measure gives more weight to mismatches that occur on attributes that take many values.
This measure assigns lower similarity to mismatches on more frequent values. The IOF measure is related to the concept of inverse document frequency which comes from information retrieval, where it is used to signify the relative number of documents that contain a spe- cific word.
This measure gives the opposite weighting of the IOF measure for mismatches, i.e., mismatches on less frequent values are assigned lower similarity and mismatches on more frequent values are assigned higher similarity
This measure assigns a high similarity if the matching values are infrequent regardless of the frequencies of the other values.
This measure gives higher weight to matches on frequent values, and lower weight to mismatches on infrequent values.
An object of class distance
Jose L. Vicente-Villardon
Boriah, S., Chandola, V. & Kumar,V.(2008). Similarity measures for categorical data: A comparative evaluation. In proceedings of the eight SIAM International Conference on Data Mining, pp 243–254.
BinaryDistances
,ContinuousDistances
## Not run:
data(Env)
Distance<-NominalDistances(Env,upper=TRUE,diag=TRUE,similarity=FALSE,method=1)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.