Predicting the class of a group of individuals with discriminant analysis of probability densities.

Share:

Description

Allocates several groups of individuals, one group after another, to one class of groups (among K classes of groups) using the L^2 distances between the density function associated to the group to allocate and the density functions associated to the K classes.

Usage

1
2
fdiscd.predict(x, classe, crit = 1, gaussiand = TRUE, kern = NULL, windowh = NULL, 
               misclass.ratio = FALSE)

Arguments

x

data frame with (p+1) columns. The first p columns are numeric. The last column is a factor with T levels defining T groups. Each group, say t, consists of n_t individuals.

classe

data frame with two columns. The first column contains the names of the T groups (all the names must be different). The second column is a factor with K levels partitionning the T groups into K classes.

crit

1, 2 or 3. In order to select the densities associated to the classes. See Details.

gaussiand

logical. If TRUE (default), the probability densities are supposed Gaussian. If FALSE, densities are estimated using the Gaussian kernel method.

kern

string. If gaussiand = FALSE, this argument sets the kernel used in the estimation method. Currently, only the Gaussian kernel is available: the settings kern = "gauss" and kern = NULL are equivalent.

windowh

strictly positive number. If windowh=NULL (default), the bandwidths are computed using the bandwidth.parameter function.

misclass.ratio

logical (default FALSE). If TRUE, the confusion matrix and misclassification ratio are computed on the groups whose prior class is known. In order to compute the misclassification ratio by the one-leave-out method, use the fdiscd.misclass function.

Details

To the group t is associated the density denoted f_t. To the class k consisting of T_k groups is associated the density denoted g_k. The crit argument selects the estimation method of the K densities g_k.

  1. The density g_k is estimated using the whole data of this class, that is the rows of x corresponding to the T_k groups of the class k.

  2. The T_k densities f_t are estimated using the corresponding data from x. Then they are averaged to obtain an estimation of the density g_k, that is g_k = (1/T_k)∑{f_t}.

  3. Each previous density f_t is weighted by n_t (the number of rows of x corresponding to f_t). Then they are averaged, that is g_k = (1/∑ n_t) ∑ n_t f_t.

Value

Returns an object of class fdiscd.predict, that is a list including:

prediction

data frame with 3 columns:

  • factor giving the group name. The column name is the same as that of the column (p+1) of x,

  • class.known: the prior class of the group if it is available, or NA if not,

  • class.predict: the class allocation predicted by the discriminant analysis method.

distances

matrix with T rows and K columns, of the distances (d_{tk}): d_{tk} is the distance between the group t and the class k,

proximities

matrix of the proximities (in percents). The proximity of a group t to the class k is computed as so: (1/d_{tk})/∑_{l=1}^{l=K}(1/d_{tl}).

confusion.mat

the confusion matrix (if misclass.ratio = TRUE)

misclassed

the misclassification ratio (if misclass.ratio = TRUE)

Author(s)

Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard.

References

Boumaza, R. (2004). Discriminant analysis with independently repeated multivariate measurements: an L^2 approach. Computational Statistics & Data Analysis, 47, 823-843.

Rudrauf, J.M., Boumaza, R. (2001). Contribution <e0> l'<e9>tude de l'architecture m<e9>di<e9>vale: les caract<e9>ristiques des pierres <e0> bossage des ch<e2>teaux forts alsaciens. Centre de Recherches Arch<e9>ologiques M<e9>di<e9>vales de Saverne, 5, 5-38.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
data(castles.dated)
data(castles.nondated)
castles.stones <- rbind(castles.dated$stones, castles.nondated$stones)
castles.periods <- rbind(castles.dated$periods, castles.nondated$periods)


# The densities are supposed Gaussian and parametrically estimated
result1 <- fdiscd.predict(castles.stones, castles.periods)

print(result1)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.