Dissimilarity based discrepancy

Share:

Description

Compute the discrepancy from the pairwise dissimilarities between objects. The discrepancy is a measure of dispersion of the set of objects.

Usage

1
dissvar(diss, weights=NULL, squared = FALSE)

Arguments

diss

A dissimilarity matrix or a dist object (see dist)

weights

optional numerical vector containing weights.

squared

Logical. If TRUE diss is squared.

Details

The discrepancy is an extension of the concept of variance to any kind of objects for which we can compute pairwise dissimilarities. The discrepancy s^2 is defined as:

s^2=(1/(2n^2)) * sum sum d_ij

Mathematical ground: In the Euclidean case, the sum of squares can be expressed as:

SS= sum (y_i - y_mean)^2=(1/(2n)) * sum sum (y_i - y_j)^2

The concept of discrepancy generalizes the equation by allowing to replace the (y_i - y_j)^2 term with any measure of dissimilarity d_{ij}.

Value

The discrepancy.

Author(s)

Matthias Studer (with Gilbert Ritschard for the help page)

References

Studer, M., G. Ritschard, A. Gabadinho and N. S. Müller (2011). Discrepancy analysis of state sequences, Sociological Methods and Research, Vol. 40(3), 471-510.

Studer, M., G. Ritschard, A. Gabadinho and N. S. Müller (2010) Discrepancy analysis of complex objects using dissimilarities. In F. Guillet, G. Ritschard, D. A. Zighed and H. Briand (Eds.), Advances in Knowledge Discovery and Management, Studies in Computational Intelligence, Volume 292, pp. 3-19. Berlin: Springer.

Studer, M., G. Ritschard, A. Gabadinho and N. S. Müller (2009) Analyse de dissimilarités par arbre d'induction. In EGC 2009, Revue des Nouvelles Technologies de l'Information, Vol. E-15, pp. 7-18.

Anderson, M. J. (2001) A new method for non-parametric multivariate analysis of variance. Austral Ecology 26, 32-46.

Batagelj, V. (1988) Generalized ward and related clustering problems. In H. Bock (Ed.), Classification and related methods of data analysis, Amsterdam: North-Holland, pp. 67-74.

See Also

dissassoc to test association between objects represented by their dissimilarities and a covariate.
disstree for an induction tree analyse of objects characterized by a dissimilarity matrix.
disscenter to compute the distance of each object to its group center from pairwise dissimilarities.
dissmfac to perform multi-factor analysis of variance from pairwise dissimilarities.

Examples

1
2
3
4
5
6
7
8
9
## Defining a state sequence object
data(mvad)
mvad.seq <- seqdef(mvad[, 17:86])

## Building dissimilarities (any dissimilarity measure can be used)
mvad.ham <- seqdist(mvad.seq, method="HAM")

## Pseudo variance of the sequences
print(dissvar(mvad.ham))

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.