# fdiscd.predict: Predicting the class of a group of individuals with... In dad: Three-Way / Multigroup Data Analysis Through Densities

## Description

Assigns several groups of individuals, one group after another, to the class of groups (among K classes of groups) which achieves the minimum of the distances or divergences between the density function associated to the group to assign and the K density functions associated to the K classes.

## Usage

 ```1 2 3``` ```fdiscd.predict(xf, class.var, gaussiand = TRUE, distance = c("jeffreys", "hellinger", "wasserstein", "l2", "l2norm"), crit = 1, windowh = NULL, misclass.ratio = FALSE) ```

## Arguments

 `xf` object of class `folderh` with two data frames: The first one has at least two columns. One column contains the names of the T groups (all the names must be different). An other column is a factor with K levels partitionning the T groups into K classes.. The second one has (p+1) columns. The first p columns are numeric (otherwise, there is an error). The last column is a factor with T levels defining T groups. Each group, say t, consists of n_t individuals. Notice that for the versions earlier than 2.0, fdiscd.predict applied to two data frames. `class.var` string. The name of the class variable. `distance` The distance or divergence used to compute the distance matrix between the densities. It can be: `"jeffreys"` (default) Jeffreys measure (symmetrised Kullback-Leibler divergence), `"hellinger"` the Hellinger (Matusita) distance, `"wasserstein"` the Wasserstein distance, `"l2"` the L^2 distance, `"l2norm"` the densities are normed and the L^2 distance between these normed densities is used; If `gaussiand = FALSE`, the densities are estimated by the Gaussian kernel method and the distance is `"l2"` or `"l2norm"`. `crit` 1, 2 or 3. In order to select the densities associated to the classes. See Details. If `distance` is `"hellinger"`, `"jeffreys"` or `"wasserstein"`, `crit` is necessarily `1` (see Details). `gaussiand` logical. If `TRUE` (default), the probability densities are supposed Gaussian. If `FALSE`, densities are estimated using the Gaussian kernel method. If `distance` is `"hellinger"`, `"jeffreys"` or `"wasserstein"`, `gaussiand` is necessarily `TRUE`. `windowh` strictly positive number. If `windowh = NULL` (default), the bandwidths are computed using the `bandwidth.parameter` function. Omitted when `distance` is `"hellinger"`, `"jeffreys"` or `"wasserstein"` (see Details). `misclass.ratio` logical (default `FALSE`). If `TRUE`, the confusion matrix and misclassification ratio are computed on the groups whose prior class is known. In order to compute the misclassification ratio by the one-leave-out method, use the `fdiscd.misclass` function.

## Details

To the group t is associated the density denoted f_t. To the class k consisting of T_k groups is associated the density denoted g_k. The `crit` argument selects the estimation method of the K densities g_k.

1. The density g_k is estimated using the whole data of this class, that is the rows of `x` corresponding to the T_k groups of the class k.

2. The T_k densities f_t are estimated using the corresponding data from `x`. Then they are averaged to obtain an estimation of the density g_k, that is g_k = (1/T_k)∑{f_t}.

3. Each previous density f_t is weighted by n_t (the number of rows of x corresponding to f_t). Then they are averaged, that is g_k = (1/∑ n_t) ∑ n_t f_t.

The last two methods are available only for the L^2-distance. If the divergences between densities are computed using the Hellinger or Wasserstein distance or Jeffreys measure, only the first of these methods is available.

## Value

Returns an object of class `fdiscd.predict`, that is a list including:

 `prediction ` data frame with 3 columns: factor giving the group name. The column name is the same as that of the column (p+1) of `x`, `class.known`: the prior class of the group if it is available, or NA if not, `class.predict`: the class allocation predicted by the discriminant analysis method. If `misclass.ratio = TRUE`, the class allocations are computed for all groups. Otherwise (default), they are computed only for the groups whose class is unknown. `distances ` matrix with T rows and K columns, of the distances (d_{tk}): d_{tk} is the distance between the group t and the class k, computed with the measure given by argument `distance` (L^2-distance, Hellinger distance or jeffreys measure), `proximities ` matrix of the proximities (in percents). The proximity of a group t to the class k is computed as so: (1/d_{tk})/∑_{l=1}^{l=K}(1/d_{tl}). `confusion.mat ` the confusion matrix (if `misclass.ratio = TRUE`) `misclassed ` the misclassification ratio (if `misclass.ratio = TRUE`)

## Author(s)

Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard

## References

Boumaza, R. (2004). Discriminant analysis with independently repeated multivariate measurements: an L^2 approach. Computational Statistics & Data Analysis, 47, 823-843.

Rudrauf, J.M., Boumaza, R. (2001). Contribution ? l'?tude de l'architecture m?di?vale: les caract?ristiques des pierres ? bossage des ch?teaux forts alsaciens. Centre de Recherches Arch?ologiques M?di?vales de Saverne, 5, 5-38.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30``` ```data(castles.dated) data(castles.nondated) castles.stones <- rbind(castles.dated\$stones, castles.nondated\$stones) castles.periods <- rbind(castles.dated\$periods, castles.nondated\$periods) castlesfh <- folderh(castles.periods, "castle", castles.stones) # With the L^2-distance # - crit=1 resultl2.1 <- fdiscd.predict(castlesfh, "period", distance="l2", crit=1) print(resultl2.1) # - crit=2 ## Not run: resultl2.2 <- fdiscd.predict(castlesfh, "period", distance="l2", crit=2) print(resultl2.2) ## End(Not run) # - crit=3 resultl2.3 <- fdiscd.predict(castlesfh, "period", distance="l2", crit=3) print(resultl2.3) # With the Hellinger distance resulthelling <- fdiscd.predict(castlesfh, "period", distance="hellinger") print(resulthelling) # With jeffreys measure resultjeff <- fdiscd.predict(castlesfh, "period", distance="jeffreys") print(resultjeff) ```

dad documentation built on March 16, 2021, 9:05 a.m.