damda_varsel: Inductive Variable Selection for Dimension-Adaptive Mixture...

Description Usage Arguments Details Value References

View source: R/damda_varsel.R

Description

The function implements a fast inductive variable selection approach for the Dimension-Adaptive Mixture Discriminant Analysis classifier. The method allows to find the optimal subset of variables having the most useful information at discriminating the classes in the test data. A greedy stepwise-forward search is implemented, which can also be run exploiting parallel computing functionalities.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
damda_varsel(learn, data,
             K = learn$K,
             H = 0:1,
             regularize = FALSE,
             start = NULL,
             control_em = damda::control_em(),
             control_reg = damda::control_reg(),
             parallel = FALSE,
             verbose = TRUE,
             maxit = 100)

Arguments

learn

A list containing a collection of class-specific parameters estimated in the training phase, a.k.a. the learning phase. The parameters typically are those corresponding to a Gaussian mixture discriminant analysis classifier. The list must include the following slots:

pro

A vector containing the class mixing proportions (class proportions).

mu

The mean for each class, arranged column-wise, i.e. columns denote the classes.

sigma

An array containing the class-specific covariance matrices.

K

The number of classes observed in the training set.

data

A matrix or data.frame containing the test data.

K

The number of classes observed in the training data. No need to be specified if the list in argument learn already includes the number of classes in the training set.

H

An integer vector specifying the numbers of extra classes for which the BIC is to be calculated. Default is to look from 0 to 1 extra classes in the test data.

regularize

A logical argument indicating if Bayesian regularization should be performed. Default to FALSE.

start

An optional vector containing the indexes of variables used to initialize the variable selection algorithm. If NULL, all the variables present in the learning phase are used as the initial set of variables useful for classification.

control_em

A list of control parameters used in the EM algorithm for inductive model estimation; see also control_em.

control_reg

A list of hyper parameters for Bayesian regularization. Only used when regularization = TRUE; see also control_reg.

parallel

A logical argument indicating if parallel computation should be used for the variable selection stepwise search. If TRUE, all the available cores are used. The argument could also be set to a numeric integer value specifying the number of cores to be employed.

verbose

If TRUE a progress bar will be shown.

maxit

The maximum number of iterations (of addition and removal steps) the variable selection algorithm is allowed to run for.

Details

The function implements an inductive variable selection procedure for the Dimension-Adaptive Mixture Discriminant Analysis (D-AMDA) classifier. A greedy forward-stepwise search is used to search the model space, where variables are added and removed in turn from the current set of variables useful for classification.

To assess the discriminating power of a variable, the selection procedure computes at each iteration the BIC difference between a model where the variable is useful for classification against a model where the variable is uninformative or redundant. In addition to perform variable selection, the algorithm also returns the optimal number of hidden classes in the test data (if any).

Value

A list including the following slots:

variables

The set of selected relevant classification variables.

model

An object of class damda containing the D-AMDA model fitted on the selected variables.

time

The time taken to run the variable selection procedure.

References

Fop, M., Mattei, P. A., Bouveyron, C., Murphy, T. B. (2021). Unobserved classes and extra variables in high-dimensional discriminant analysis. Advances in Data Analysis and Classification, accepted.


michaelfop/damda documentation built on Dec. 21, 2021, 5:57 p.m.