pdaDim: Finding optimal dimensionality in pda

View source: R/pdaDim.R

pdaDimR Documentation

Finding optimal dimensionality in pda

Description

Fits pda models of 1,...,max.dim dimensions and finds the optimal dimension by cross-validation and a regularization based on the McNemar-test.

Usage

pdaDim(
  y,
  X,
  reg = 0.5,
  prior = NULL,
  max.dim = NULL,
  selected = NULL,
  n.seg = 10,
  verbose = TRUE
)

Arguments

y

Vector of responses, must be a factor with exactly 2 levels.

X

Matrix of predictor values.

reg

The regularization parameter, see below.

prior

Vector of prior probabilities, one value for each factor level in y.

max.dim

Integer, the maximum number of dimensions to consider.

selected

Vector of logicals, indicating a variable selection, see below.

n.seg

Integer, the number of cross-validation segments.

verbose

Logical, turns on/off output during computations.

Details

The PLS method, which is part of the pda method, requires a decision on the number of dimensions (components) to use. This is usually found by cross-validation, trying out many different dimensions and searching for the best performance, e.g. the classification accuracy.

In this algorithm a search for maximum accuracy is also conducted, but then the smallest dimension giving an accuracy not significantly poorer than the maximum is used. The latter is based on the McNemar-test, comparing the classifications of the maximum to that of the reduced model. See mcnemar.test for details. This procedure is inspired by the CVANOVA idea of Indahl & Naes (1998).

This implementation splits data into cross-validation segments, fits a pda model and classifies. The accuracy for each dimension is the fraction of correctly classified elements after the cross-validation.

The McNemar-test is used to determine if a simpler model is not significantly poorer than the more complex giving the maximum accuracy. The argument reg is the rejection level of this test, i.e. using a reg value close to 0.0 means a harder regularization and a smaller dimension is in general selected. Setting it to 1.0 means the dimension with the maximum accuracy (no regularization) is selected.

The argument selected is used for variable selection, and just passed on to pda.

Value

A list with the elements Dimension and Corrects. Dimension is the number of dimensions selected by this algorithm (integer). Corrects is a matrix of logicals indicating which elements of y are correctly classified (TRUE) for each dimension 1,...,max.dim.

Author(s)

Lars Snipen.

References

Indahl, UG, Naes, T (1998). Evaluation of alternative spectral feature extraction methods of textural images for multivariate modeling. J. Chemometrics, 12:261-278.

See Also

eliminator, mpda.

Examples

data(microbiome)
y <- microbiome[1:40, 1]
X <- as.matrix(microbiome[1:40, -1])
lst <- pdaDim(y, X, reg = 0.1, prior = c(0.5,0.5), max.dim = 10)


larssnip/mpda documentation built on March 28, 2022, 3:37 p.m.