# fmdsd: Multidimensional scaling of probability densities In dad: Three-Way / Multigroup Data Analysis Through Densities

 fmdsd R Documentation

## Multidimensional scaling of probability densities

### Description

Applies the multidimensional scaling (MDS) method to probability densities in order to describe a data folder, consisting of `T` groups of individuals on which are observed `p` variables. It returns an object of class `fmdsd`. It applies `cmdscale` to the distance matrix between the `T` densities.

### Usage

``````fmdsd(xf, group.name = "group", gaussiand = TRUE, distance = c("jeffreys", "hellinger",
"wasserstein", "l2", "l2norm"), windowh=NULL, data.centered = FALSE,
data.scaled = FALSE, common.variance = FALSE, add = TRUE, nb.factors = 3,
nb.values = 10, sub.title = "", plot.eigen = TRUE, plot.score = FALSE, nscore = 1:3,
filename = NULL)
``````

### Arguments

 `xf` object of class `"folder"` or data.frame. If it is an object of class `"folder"`, its elements are data frames with `p` numeric columns. If there are non numeric columns, there is an error. The `t^{th}` element (`t = 1, \ldots, T`) matches with the `t^{th}` group. If it is a data frame, the column with name given by the `group.name` argument is a factor giving the groups. The other columns are all numeric; otherwise, there is an error. `group.name` string. If `xf` is an object of class `"folder"`, it is the name of the grouping variable in the returned results. The default is `groupname = "group"`. If `xf` is a data frame, it is the name of the column of `xf` containing the groups. `gaussiand` logical. If `TRUE` (default), the probability densities are supposed Gaussian. If `FALSE`, densities are estimated using the Gaussian kernel method. `distance` The distance or divergence used to compute the distance matrix between the densities. If `gaussiand = TRUE`, the densities are parametrically estimated and the distance can be: `"jeffreys"` (default) Jeffreys measure (symmetrised Kullback-Leibler divergence), `"hellinger"` the Hellinger (Matusita) distance, `"wasserstein"` the Wasserstein distance, `"l2"` the `L^2` distance, `"l2norm"` the densities are normed and the `L^2` distance between these normed densities is used; If `gaussiand = FALSE`, the densities are estimated by the Gaussian kernel method and the distance can be `"l2"` (default) or `"l2norm"`. `windowh` either a list of `T` bandwidths (one per density associated to a group), or a strictly positive number. If `windowh = NULL` (default), the bandwidths are automatically computed. See Details. Omitted when `distance` is `"hellinger"`, `"jeffreys"` or `"wasserstein"` (see Details). `data.centered` logical. If `TRUE` (default is `FALSE`), the data of each group are centered. `data.scaled` logical. If `TRUE` (default is `FALSE`), the data of each group are centered (even if `data.centered = FALSE`) and scaled. `common.variance` logical. If `TRUE` (default is `FALSE`), a common covariance matrix (or correlation matrix if `data.scaled = TRUE`), computed on the whole data, is used. If `FALSE` (default), a covariance (or correlation) matrix per group is used. `add` logical indicating if an additive constant should be computed and added to the non diagonal dissimilarities such that the modified dissimilarities are Euclidean (default `TRUE`; see `add` argument of `cmdscale`). `nb.factors` numeric. Number of returned principal coordinates (default `nb.factors = 3`). Warning: The `plot.fmdsd` and `interpret.fmdsd` functions cannot take into account more than `nb.factors` principal factors. `nb.values` numeric. Number of returned eigenvalues (default `nb.values = 10`). `sub.title` string. Subtitle for the graphs (default `NULL`). `plot.eigen` logical. If `TRUE` (default), the barplot of the eigenvalues is plotted. `plot.score` logical. If `TRUE`, the graphs of new coordinates are plotted. A new graphic device is opened for each pair of coordinates defined by `nscore` argument. `nscore` numeric vector. If `plot.score = TRUE`, the numbers of the principal coordinates which are plotted. By default it is equal to `nscore = 1:3`. Its components cannot be greater than `nb.factors`. `filename` string. Name of the file in which the results are saved. By default (`filename = NULL`) they are not saved.

### Details

In order to compute the distances/dissimilarities between the groups, the `T` probability densities `f_t` corresponding to the `T` groups of individuals are either parametrically estimated (`gaussiand = TRUE`) or estimated using the Gaussian kernel method (`gaussiand = FALSE`). In the latter case, the `windowh` argument provides the list of the bandwidths to be used. Notice that in the multivariate case (`p`>1), the bandwidths are positive-definite matrices.

If `windowh` is a numerical value, the matrix bandwidth is of the form `h S`, where `S` is either the square root of the covariance matrix (`p`>1) or the standard deviation of the estimated density.

If `windowh = NULL` (default), `h` in the above formula is computed using the `bandwidth.parameter` function.

The distance or dissimilarity between the estimated densities is either the `L^2` distance, the Hellinger distance, Jeffreys measure (symmetrised Kullback-Leibler divergence) or the Wasserstein distance.

• If it is the `L^2` distance (`distance="l2"` or `distance="l2norm"`), the densities can be either parametrically estimated or estimated using the Gaussian kernel.

• If it is the Hellinger distance (`distance="hellinger"`), Jeffreys measure (`distance="jeffreys"`) or the Wasserstein distance (`distance="wasserstein"`), the densities are considered Gaussian and necessarily parametrically estimated.

### Value

Returns an object of class `fmdsd`, i.e. a list including:

 `inertia` data frame of the eigenvalues and percentages of inertia. `scores` data frame of the `nb.factors` first principal coordinates. `means ` list of the means. `variances ` list of the covariance matrices. `correlations ` list of the correlation matrices. `skewness ` list of the skewness coefficients. `kurtosis ` list of the kurtosis coefficients.

### Author(s)

Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard

### References

Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.

Delicado, P. (2011). Dimensionality reduction when data are density functions. Computational Statistics & Data Analysis, 55, 401-420.

Yousfi, S., Boumaza, R., Aissani, D., Adjabi, S. (2014). Optimal bandwith matrices in functional principal component analysis of density function. Journal of Statistical Computation and Simulation, 85 (11), 2315-2330.

Cox, T.F., Cox, M.A.A. (2001). Multimensional Scaling, second ed. Chapman & Hall/CRC.

### Examples

``````data(roses)
rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")])

# MDS on Gaussian densities (on sensory data)

# using jeffreys measure (default):
resultjeff <- fmdsd(rosesf, distance = "jeffreys")
print(resultjeff)
plot(resultjeff)

## Not run:
# Applied to a data frame:
resultjeffdf <- fmdsd(roses[,c("Sha","Den","Sym","rose")],
distance = "jeffreys", group.name = "rose")
print(resultjeffdf)
plot(resultjeffdf)

## End(Not run)

# using the Hellinger distance:
resulthellin <- fmdsd(rosesf, distance = "hellinger")
print(resulthellin)
plot(resulthellin)

# using the Wasserstein distance:
resultwass <- fmdsd(rosesf, distance = "wasserstein")
print(resultwass)
plot(resultwass)

# Gaussian case, using the L2-distance:
resultl2 <- fmdsd(rosesf, distance = "l2")
print(resultl2)
plot(resultl2)

# Gaussian case, using the L2-distance between normed densities:
resultl2norm <- fmdsd(rosesf, distance = "l2norm")
print(resultl2norm)
plot(resultl2norm)

## Not run:
# Non Gaussian case, using the L2-distance,
# the densities are estimated using the Gaussian kernel method:
result <- fmdsd(rosesf, distance = "l2", gaussiand = FALSE, group.name = "rose")
print(result)
plot(result)

## End(Not run)
``````

dad documentation built on Aug. 30, 2023, 5:06 p.m.