Compute distances to the center of a group

Description

Computes the dissimilarity between objects and their group center from their pairwise dissimilarity matrix.

Usage

1
2
disscenter(diss, group=NULL, medoids.index=NULL,
           allcenter = FALSE, weights=NULL, squared=FALSE)

Arguments

diss

a dissimilarity matrix such as generated by seqdist, or a dist object (see dist)

group

if NULL (default), the whole data set is considered. Otherwise a different center is considered for each distinct value of the group variable

medoids.index

if NULL, returns the dissimilarity to the center. If set to "first", returns the index of the first encountered most central sequence. If group is set, an index is returned per group. When set to "all", indexes of all medoids (one list per group) are returned.

allcenter

logical. If TRUE, returns a data.frame containing the dissimilarity between each object and its group center, each column corresponding to a group.

weights

optional numerical vector containing weights.

squared

Logical. If TRUE diss is squared.

Details

This function computes the dissimilarity between given objects and their group center. It is possible that the group center does not belong to the space formed by the objects (in the same way as the average of integer numbers is not necessarily an integer itself). This distance can also be understood as the contribution to the discrepancy (see dissvar). Note that when the dissimilarity measure does not respect the triangle inequality, the dissimilarity between a given object and its group center may be negative

It can be shown that this dissimilarity is equal to (see Batagelj 1988):

d_(xg)=1/n *(sum d_xi - SS)

where SS is the sum of squares (see dissvar).

Value

A vector with the dissimilarity to the group center for each object, or a list of medoid indexes.

Author(s)

Matthias Studer (with Gilbert Ritschard for the help page)

References

Studer, M., G. Ritschard, A. Gabadinho and N. S. Müller (2011). Discrepancy analysis of state sequences, Sociological Methods and Research, Vol. 40(3), 471-510.

Studer, M., G. Ritschard, A. Gabadinho and N. S. Müller (2010) Discrepancy analysis of complex objects using dissimilarities. In F. Guillet, G. Ritschard, D. A. Zighed and H. Briand (Eds.), Advances in Knowledge Discovery and Management, Studies in Computational Intelligence, Volume 292, pp. 3-19. Berlin: Springer.

Studer, M., G. Ritschard, A. Gabadinho and N. S. Müller (2009) Analyse de dissimilarités par arbre d'induction. In EGC 2009, Revue des Nouvelles Technologies de l'Information, Vol. E-15, pp. 7–18.

Batagelj, V. (1988) Generalized ward and related clustering problems. In H. Bock (Ed.), Classification and related methods of data analysis, Amsterdam: North-Holland, pp. 67–74.

See Also

dissvar to compute the pseudo variance from dissimilarities and for a basic introduction to concepts of pseudo variance analysis
dissassoc to test association between objects represented by their dissimilarities and a covariate.
disstree for an induction tree analyse of objects characterized by a dissimilarity matrix.
dissmfac to perform multi-factor analysis of variance from pairwise dissimilarities.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## Defining a state sequence object
data(mvad)
mvad.seq <- seqdef(mvad[, 17:86])

## Building dissimilarities (any dissimilarity measure can be used)
mvad.ham <- seqdist(mvad.seq, method="HAM")

## Compute distance to center according to group gcse5eq
dc <- disscenter(mvad.ham, group=mvad$gcse5eq)

## Ploting distribution of dissimilarity  to center
boxplot(dc~mvad$gcse5eq, col="cyan")

## Retrieving index of the first medoids, one per group
dc <- disscenter(mvad.ham, group=mvad$Grammar, medoids.index="first")
print(dc)

## Retrieving index of all medoids in each group
dc <- disscenter(mvad.ham, group=mvad$Grammar, medoids.index="all")
print(dc)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.