fmdsd: Multidimensional scaling of probability densities

View source: R/fmdsd.R

fmdsdR Documentation

Multidimensional scaling of probability densities


Applies the multidimensional scaling (MDS) method to probability densities in order to describe a data folder, consisting of T groups of individuals on which are observed p variables. It returns an object of class fmdsd. It applies cmdscale to the distance matrix between the T densities.


fmdsd(xf, = "group", gaussiand = TRUE, distance = c("jeffreys", "hellinger",
    "wasserstein", "l2", "l2norm"), windowh=NULL, data.centered = FALSE,
    data.scaled = FALSE, common.variance = FALSE, add = TRUE, nb.factors = 3,
    nb.values = 10, sub.title = "", plot.eigen = TRUE, plot.score = FALSE, nscore = 1:3,
    filename = NULL)



object of class "folder" or data.frame.

  • If it is an object of class "folder", its elements are data frames with p numeric columns. If there are non numeric columns, there is an error. The t^{th} element (t = 1, \ldots, T) matches with the t^{th} group.

  • If it is a data frame, the column with name given by the argument is a factor giving the groups. The other columns are all numeric; otherwise, there is an error.


  • If xf is an object of class "folder", it is the name of the grouping variable in the returned results. The default is groupname = "group".

  • If xf is a data frame, it is the name of the column of xf containing the groups.


logical. If TRUE (default), the probability densities are supposed Gaussian. If FALSE, densities are estimated using the Gaussian kernel method.


The distance or divergence used to compute the distance matrix between the densities.

If gaussiand = TRUE, the densities are parametrically estimated and the distance can be:

  • "jeffreys" (default) Jeffreys measure (symmetrised Kullback-Leibler divergence),

  • "hellinger" the Hellinger (Matusita) distance,

  • "wasserstein" the Wasserstein distance,

  • "l2" the L^2 distance,

  • "l2norm" the densities are normed and the L^2 distance between these normed densities is used;

If gaussiand = FALSE, the densities are estimated by the Gaussian kernel method and the distance can be "l2" (default) or "l2norm".


either a list of T bandwidths (one per density associated to a group), or a strictly positive number. If windowh = NULL (default), the bandwidths are automatically computed. See Details.

Omitted when distance is "hellinger", "jeffreys" or "wasserstein" (see Details).


logical. If TRUE (default is FALSE), the data of each group are centered.


logical. If TRUE (default is FALSE), the data of each group are centered (even if data.centered = FALSE) and scaled.


logical. If TRUE (default is FALSE), a common covariance matrix (or correlation matrix if data.scaled = TRUE), computed on the whole data, is used. If FALSE (default), a covariance (or correlation) matrix per group is used.


logical indicating if an additive constant should be computed and added to the non diagonal dissimilarities such that the modified dissimilarities are Euclidean (default TRUE; see add argument of cmdscale).


numeric. Number of returned principal coordinates (default nb.factors = 3).

Warning: The plot.fmdsd and interpret.fmdsd functions cannot take into account more than nb.factors principal factors.


numeric. Number of returned eigenvalues (default nb.values = 10).


string. Subtitle for the graphs (default NULL).


logical. If TRUE (default), the barplot of the eigenvalues is plotted.


logical. If TRUE, the graphs of new coordinates are plotted. A new graphic device is opened for each pair of coordinates defined by nscore argument.


numeric vector. If plot.score = TRUE, the numbers of the principal coordinates which are plotted. By default it is equal to nscore = 1:3. Its components cannot be greater than nb.factors.


string. Name of the file in which the results are saved. By default (filename = NULL) they are not saved.


In order to compute the distances/dissimilarities between the groups, the T probability densities f_t corresponding to the T groups of individuals are either parametrically estimated (gaussiand = TRUE) or estimated using the Gaussian kernel method (gaussiand = FALSE). In the latter case, the windowh argument provides the list of the bandwidths to be used. Notice that in the multivariate case (p>1), the bandwidths are positive-definite matrices.

If windowh is a numerical value, the matrix bandwidth is of the form h S, where S is either the square root of the covariance matrix (p>1) or the standard deviation of the estimated density.

If windowh = NULL (default), h in the above formula is computed using the bandwidth.parameter function.

The distance or dissimilarity between the estimated densities is either the L^2 distance, the Hellinger distance, Jeffreys measure (symmetrised Kullback-Leibler divergence) or the Wasserstein distance.

  • If it is the L^2 distance (distance="l2" or distance="l2norm"), the densities can be either parametrically estimated or estimated using the Gaussian kernel.

  • If it is the Hellinger distance (distance="hellinger"), Jeffreys measure (distance="jeffreys") or the Wasserstein distance (distance="wasserstein"), the densities are considered Gaussian and necessarily parametrically estimated.


Returns an object of class fmdsd, i.e. a list including:


data frame of the eigenvalues and percentages of inertia.


data frame of the nb.factors first principal coordinates.


list of the means.


list of the covariance matrices.


list of the correlation matrices.


list of the skewness coefficients.


list of the kurtosis coefficients.


Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard


Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.

Delicado, P. (2011). Dimensionality reduction when data are density functions. Computational Statistics & Data Analysis, 55, 401-420.

Yousfi, S., Boumaza, R., Aissani, D., Adjabi, S. (2014). Optimal bandwith matrices in functional principal component analysis of density function. Journal of Statistical Computation and Simulation, 85 (11), 2315-2330.

Cox, T.F., Cox, M.A.A. (2001). Multimensional Scaling, second ed. Chapman & Hall/CRC.

See Also

fpcad print.fmdsd, plot.fmdsd, interpret.fmdsd, bandwidth.parameter


rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")])

# MDS on Gaussian densities (on sensory data)

# using jeffreys measure (default):
resultjeff <- fmdsd(rosesf, distance = "jeffreys")

## Not run: 
# Applied to a data frame:
resultjeffdf <- fmdsd(roses[,c("Sha","Den","Sym","rose")],
                      distance = "jeffreys", = "rose")

## End(Not run)

# using the Hellinger distance:
resulthellin <- fmdsd(rosesf, distance = "hellinger")

# using the Wasserstein distance:
resultwass <- fmdsd(rosesf, distance = "wasserstein")

# Gaussian case, using the L2-distance:
resultl2 <- fmdsd(rosesf, distance = "l2")

# Gaussian case, using the L2-distance between normed densities:
resultl2norm <- fmdsd(rosesf, distance = "l2norm")

## Not run: 
# Non Gaussian case, using the L2-distance,
# the densities are estimated using the Gaussian kernel method:
result <- fmdsd(rosesf, distance = "l2", gaussiand = FALSE, = "rose")

## End(Not run)

dad documentation built on Aug. 30, 2023, 5:06 p.m.