Multidimensional Scaling of Discrete Probability Distributions

knitr::opts_chunk$set(echo = TRUE)
library(dad)

Introduction: example and objective of the method

The dataset dspg of the dad package is a list of $T = 7$ matrices. For each of the $T$ years 1968, 1975, 1982, 1990, 1999, 2010 and 2015, we have the contingency table of Diploma × Socioprofessional group in France. Each table has:

data("dspg")
print(dspg)

After the computation of the distances or divergences between each pair of occasions, that is the distances $(\delta_{ts})$ between their corresponding distributions, the MDS technique looks for a representation of the distributions by $T$ points in a low dimensional space such that the distances between these points are as similar as possible to the $(\delta_{ts})$.

The dad package includes functions for all the calculations required to implement such a method and to interpret its outputs:

The mdsdd function

MDS of discrete probability distributions can be carried using the mdsdd function. This function applies to

The following example shows the application of mdsdd on a list of arrays. The mdsdd function is built on the cmdscale function of R. It is carried out on the dataset dspg as follows:

resultmds <- mdsdd(dspg)

In addition to the add argument of cmdscale, the mdsdd function has two sets of optional arguments:

Interpretation of mdsdd outputs

The mdsdd function returns an object of S3 class "mdsdd", consisting of a list of 9 elements, including the scores, also called principal coordinates, and the marginal and joint distributions of the variables per occasion.

names(resultmds)

The outputs are displayed with the print function:

print(resultmds)

Graphical representations on the principal planes are generated with the plot function:

plot(resultmds, fontsize.points = 1)

In this example, a single axis is enough to explain the general trends; the first principal coordinate explains 92% of the inertia.

This graph shows an evolution of the value of the first principal score, which gets higher for recent years.

The interpretation of outputs is based on the relationships between the principal scores and the marginal or joint frequencies. These relationships are quantified by correlation coefficients and are represented graphically by plotting the scores against the frequencies. These interpretation tools are provided by the interpret function which has two optional arguments: nscores indicating the indices of the column scores to be interpreted and mma whose default value is "marg1" (the probability distributions of each variable).

interpret(resultmds, nscore = 1)

From the correlations between the principal coordinates (PC) and the distributions of the variables, we deduce that:

So, reminding that $PC1$ gets higher for recent years, these results highlight that in France, since 1968:



Try the dad package in your browser

Any scripts or data that you put into this service are public.

dad documentation built on Aug. 9, 2021, 1:06 a.m.