# seqemlt: Euclidean Coordinates for Longitudinal Timelines In TraMineRextras: TraMineR Extension

## Description

Computes the Euclidean coordinates of sequences from which we get the EMLT distance between sequences introduced in Rousset et al (2012).

## Usage

 `1` ```seqemlt(seqdata, a = 1, b = 1, weighted = TRUE) ```

## Arguments

 `seqdata` a state sequence object defined with the `seqdef` function. `a` optional argument for the weighting mechanism that controls the balancing between short term/long term transitions. The weighting function is 1/(a*s+b) where s is the transition step. `b` see argument `a`. `weighted` Logical: Should weights in the sequence object `seqdata` be used?

## Details

The EMLT distance is the sum of the dissimilarity between the pairs of states observed at the successive positions, where the dissimilarity between states is defined at each position as the Chi-squared distance between the normalized vectors of transition probabilities (profiles of situations) from the current state to the next observed states in the sequence. Transition probabilities are down-weighted with the time distance to avoid exaggerated importance of transitions over long periods. The adjustment weight is 1/a*s+b, where s is the period length over which the transition probability is measured.

The EMLT distance between two sequences is obtained as the Euclidean distance between the returned numerical sequence coordinates. So, providing `coord` as the data input to any clustering algorithm that uses the Euclidean metric is equivalent to cluster with the EMLT metric.

Each time-indexed state is called a situation, and the distance between two states at a position t is derived from the transition probabilities to other observed situations. The distance between any situation and a situation that does not occur is coded as `NA`. Such non-occurring situations have no influence on the distance between sequences.

The obtained numerical representations of sequences may be used as input to any Euclidean algorithm (clustering algorithms, ...).

## Value

An object of class `emlt` with the following components:

 `coord` Matrix with in each row the EMLT numerical coordinates of the corresponding sequence. `states ` list of states `situations ` list of situations (timestamped states) `sit.freq ` Situation frequencies `sit.transrate ` matrix of transition probabilities from each situation to future situations `sit.profil ` profiles of situations. Each profile is the normalized vector of transition probabilities to future situations adjusted to down weight transitions over longer periods. `sit.cor ` Matrix of correlations between situations. Two situations are highly correlated when their profiles are similar (i.e., when their transitions towards future are similar).

## Author(s)

Patrick Rousset, Senior researcher at Cereq, [email protected] with the help of Matthias Studer. Help page by Gilbert Ritschard.

## References

Rousset, Patrick and Jean-Fran<c3><a7>ois Giret (2007), Classifying Qualitative Time Series with SOM: The Typology of Career Paths in France, in F. Sandoval, A. Prieto and M. Grana (Eds) Computational and Ambient Intelligence, Lecture Notes in Computer science, vol 4507, Berlin: Springer, pp 757-764.

Rousset, Patrick, Jean-Fran<c3><a7>ois Giret and Yvette Grelet (2012) Typologies De Parcours et Dynamique Longitudinale, Bulletin de m<c3><a9>thodologie sociologique, 114(1), 5-34.

Rousset, Patrick and Jean-Fran<c3><a7>ois Giret (2008) A longitudinal Analysis of Labour Market Data with SOM, in J. Rabu<c3><b1>al Dopico, J. Dorado, & A. Pazos (Eds.) Encyclopedia of Artificial Intelligence, Hershey, PA: Information Science Reference, pp 1029-1035.

Studer, Matthias and Gilbert Ritschard (2014) A comparative review of sequence dissimilarity measures. LIVES Working Paper, 33 http://www.lives-nccr.ch/sites/default/files/pdf/publication/33_lives_wp_studer_sequencedissmeasures.pdf

`plot.emlt`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27``` ```data(mvad) mvad.seq <- seqdef(mvad[1:100, 17:41]) alphabet(mvad.seq) head(labels(mvad.seq)) ## Computing distance mvad.emlt <- seqemlt(mvad.seq) ## typology1 with kmeans in 3 clusters km <- kmeans(mvad.emlt\$coord, 3) ##Plotting by clusters of typology1 seqdplot(mvad.seq, group=km\$cluster) ## typology2: 3 clusters by applying hierarchical ward ## on the centers of the 25 group kmeans solution km<-kmeans(mvad.emlt\$coord, 25) hc<-hclust(dist(km\$centers, method="euclidean"), method="ward") zz<-cutree(hc, k=3) ##Plotting by clusters of typology2 seqdplot(mvad.seq, group=zz[km\$cluster]) ## Plotting the evolution of the correlation between states plot(mvad.emlt, from="employment", to="joblessness", type="cor") plot(mvad.emlt, from=c("employment","HE", "school", "FE"), to="joblessness", delay=0, leg=TRUE) plot(mvad.emlt, from="joblessness", to="employment", delay=6) plot(mvad.emlt, type="pca", cex=0.4, compx=1, compy=2) ```