# discdd.misclass: Misclassification ratio in functional discriminant analysis... In dad: Three-Way / Multigroup Data Analysis Through Densities

 discdd.misclass R Documentation

## Misclassification ratio in functional discriminant analysis of discrete probability distributions.

### Description

Computes the one-leave-out misclassification ratio of the rule assigning T groups of individuals, one group after another, to the class of groups (among K classes of groups) which achieves the minimum of the distances or divergences between the probability distribution associated to the group to assign and the K probability distributions associated to the K classes.

### Usage

discdd.misclass(xf, class.var, distance =  c("l1", "l2", "chisqsym", "hellinger",
"jeffreys", "jensen", "lp"), crit = 1, p)


### Arguments

 xf object of class folderh with two data frames or list of arrays (or tables). If it is a folderh: The first data.frame has at least two columns. One column contains the names of the T groups (all the names must be different). An other column is a factor with K levels partitionning the T groups into K classes. The second one has (q+1) columns. The first q columns are factors (otherwise, they are coerced into factors). The last column is a factor with T levels defining T groups. Each group, say t, consists of n_t individuals. If it is a list of arrays or tables, the t^{th} element (t = 1, \ldots, T) is the table of the joint distribution (absolute or relative frequencies) of the t^{th} group. These arrays have the same shape: Each array (or table) xf[[i]] has: the same dimension(s). If q = 1 (univariate), dim(xf[[i]]) is an integer. If q > 1 (multivariate), dim(xf[[i]]) is an integer vector of length q. the same dimension names dimnames(xf[[i]]) (is non NULL). These dimnames are the names of the variables. class.var string (if xf is an object of class "folderh") or data.frame with two columns (if xf is a list of arrays). If xf is of class "folder", class.var is the name of the class variable. If xf is a list of arrays or a list of tables, class.var is a data.frame with at least two columns named "group" and "class". The "group" column contains the names of the T groups (all the names must be different). The "class" column is a factor with K levels partitioning the T groups into K classes. distance The distance or dissimilarity used to compute the distance matrix between the densities. It can be: "l1" (default) the L^p distance with p = 1 "l2" the L^p distance with p = 2 "chisqsym" the symmetric Chi-squared distance "hellinger" the Hellinger metric (Matusita distance) "jeffreys" Jeffreys distance (symmetrised Kullback-Leibler divergence) "jensen" the Jensen-Shannon distance "lp" the L^p distance with p given by the argument p of the function. crit 1 or 2. In order to select the densities associated to the classes. See Details. p integer. Optional. When distance = "lp" (L^p distance with p>2), p is the parameter of the distance.

### Details

• If xf is an object of class "folderh" containing the data:

The T probability distributions f_t corresponding to the T groups of individuals are estimated by frequency distributions within each group.

To the class k consisting of T_k groups is associated the probability distribution g_k, knowing that when using the one-leave-out method, we do not include the group to assign in its class k. The crit argument selects the estimation method of the g_k's.

• crit=1 The probability distribution g_k is estimated using the whole data of this class, that is the rows of x corresponding to the T_k groups of the class k.

The estimation of the g_k's uses the same method as the estimation of the f_t's.

• crit=2 The T_k probability distributions f_t are estimated using the corresponding data from xf. Then they are averaged to obtain an estimation of the density g_k, that is g_k = \frac{1}{T_k} \, \sum{f_t}.

• If xf is a list of arrays (or list of tables):

The t^{th} array is the joint frequency distribution of the t^{th} group. The frequencies can be absolute or relative.

To the class k consisting of T_k groups is associated the probability distribution g_k, knowing that when using the one-leave-out method, we do not include the group to assign in its class k. The crit argument selects the estimation method of the g_k's.

• crit=1 g_k = \frac{1}{\sum n_t} \sum n_t f_t, where n_t is the total of xf[[t]].

Notice that when xf[[t]] contains relative frequencies, its total is 1. That is equivalent to crit=2.

• crit=2 g_k = \frac{1}{T_k} \, \sum f_t.

### Value

Returns an object of class discdd.misclass, that is a list including:

 classification  data frame with 4 columns: factor giving the group name. The column name is the same as that of the column (q+1) of x, the prior class of the group if it is available, or NA if not, alloc: the class allocation computed by the discriminant analysis method, misclassed: boolean. TRUE if the group is misclassed, FALSE if it is well-classed, NA if the prior class of the group is unknown. confusion.mat  confusion matrix, misalloc.per.class  the misclassification ratio per class, misclassed  the misclassification ratio, distances  matrix with T rows and K columns, of the distances (d_{tk}): d_{tk} is the distance between the group t and the class k, proximities  matrix of the proximity indices (in percents) between the groups and the classes. The proximity between the group t and the class k is: (1/d_{tk})/\sum_{l=1}^{l=K}(1/d_{tl}).

### Author(s)

Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard

### References

Rudrauf, J.M., Boumaza, R. (2001). Contribution à l'étude de l'architecture médiévale: les caractéristiques des pierres à bossage des châteaux forts alsaciens, Centre de Recherches Archéologiques médiévales de Saverne, 5, 5-38.

### Examples

# Example 1 with a folderh obtained by converting numeric variables
data("castles.dated")
stones <- castles.dated$stones periods <- castles.dated$periods
stones$height <- cut(stones$height, breaks = c(19, 27, 40, 71), include.lowest = TRUE)
stones$width <- cut(stones$width, breaks = c(24, 45, 62, 144), include.lowest = TRUE)
stones$edging <- cut(stones$edging, breaks = c(0, 3, 4, 8), include.lowest = TRUE)
stones$boss <- cut(stones$boss, breaks = c(0, 6, 9, 20), include.lowest = TRUE )

castlefh <- folderh(periods, "castle", stones)

# Default: dist="l1", crit=1
discdd.misclass(castlefh, "period")

# Hellinger distance, crit=2
discdd.misclass(castlefh, "period", distance = "hellinger", crit = 2)

# Example 2 with a list of 96 arrays
data("dspgd2015")
data("departments")
classes <- departments[, c("coded", "namer")]
names(classes) <- c("group", "class")

# Default: dist="l1", crit=1
discdd.misclass(dspgd2015, classes)

# Hellinger distance, crit=2
discdd.misclass(dspgd2015, classes, distance = "hellinger", crit = 2)


dad documentation built on Aug. 30, 2023, 5:06 p.m.