damda: Dimension-Adaptive Mixture Discriminant Analysis

Description Usage Arguments Details Value References

View source: R/damda.R

Description

Implements the Dimension-Adaptive Mixture Discriminant Analysis classifier for settings where the test data might include additional unknown classes and/or extra dimensions. The function performs automatic selection of the number of hidden classes using BIC.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
damda(learn, data, 
      K = learn$K, 
      H = 0:2, 
      regularize = FALSE, 
      control_em = damda::control_em(), 
      control_reg = damda::control_reg(), 
      verbose = TRUE)
      
## S3 method for class 'damda'
predict(object, newdata, ...)

Arguments

learn

A list containing a collection of class-specific parameters estimated in the training phase, a.k.a. the learning phase. The parameters typically are those corresponding to a Gaussian mixture discriminant analysis classifier. The list must include the following slots:

pro

A vector containing the class mixing proportions (class proportions).

mu

The mean for each class, arranged column-wise, i.e. columns denote the classes.

sigma

An array containing the class-specific covariance matrices.

K

The number of classes observed in the training set.

data

A matrix or data.frame containing the test data.

K

The number of classes observed in the training data. No need to be specified if the list in argument learn already includes the number of classes in the training set.

H

An integer vector specifying the numbers of extra classes for which the BIC is to be calculated. Default is to look from 0 to 2 extra classes in the test data.

regularize

A logical argument indicating if Bayesian regularization should be performed. Default to FALSE.

control_em

A list of control parameters used in the EM algorithm for inductive model estimation; see also control_em.

control_reg

A list of hyper parameters for Bayesian regularization. Only used when regularization = TRUE; see also control_reg.

verbose

If TRUE a progress bar will be shown.

object

An object of class 'damda' resulting from a call to function damda.

newdata

A data frame or matrix giving the data for which predictions need to be obtained. If missing, the data employed in the call to data are classified.

...

Further arguments passed to or from other methods.

Details

The function implements Dimension-Adaptive Mixture Discriminant Analysis (D-AMDA) classifier for supervised learning in settings where the test data might include unknown classes and/or additional dimensions/variables. The model estimation procedure is based on an EM algorithm embedded in an inductive estimation framework.

Note that only the test data are required in input, and no training data need to be provided. Indeed, the function requires the parameter estimates obtained during the training stage, and these must correspond to class-specific proportions, means, and covariance matrices. The training stage can potentially be performed using any type of classifier, as long as corresponding class-related parameters are provide. More in line with the proposed framework of D-AMDA, a mixture discriminant analysis classifier would be implemented in the training phase.

Model selection in the context of adaptive classification corresponds to detection of hidden classes (if any), not previously observed during the training phase. To this purpose, the BIC is employed to select the optimal model.

Value

An object of class 'damda' containing the optimal D-AMDA classifier. The object in output is a list containing:

learn

The parameters learned during the training phase.

K

Number of classes observed in the training data.

H

Selected optimal number of hidden classes according to BIC.

parameters

A list including the parameters of the training phase provided in input and those estimated during the discovery phase.

z

A matrix whose [i,k]th entry is the probability that observation i of the test data belongs to the kth class.

classification

Predicted classification of the observations in the test set, corresponding to the maximum a posteriori of matrix z.

loglik

Value of the maximized log-likelihood.

N

Number of observations in the test data.

npar

Number of estimated parameters.

obs

A vector containing the indexes of the variables in the test set observed also in the training set.

ext

A vector containing the indexes of the additional variables present in the test set but not observed in the training set.

bic

Optimal BIC value.

BIC

All BIC values.

data

The test data matrix provided in input.

References

Fop, M., Mattei, P. A., Bouveyron, C., Murphy, T. B. (2021). Unobserved classes and extra variables in high-dimensional discriminant analysis. Advances in Data Analysis and Classification, accepted.


michaelfop/damda documentation built on Dec. 21, 2021, 5:57 p.m.