pdaDim | R Documentation |
Fits pda
models of 1,...,max.dim
dimensions and finds the optimal
dimension by cross-validation and a regularization based on the McNemar-test.
pdaDim( y, X, reg = 0.5, prior = NULL, max.dim = NULL, selected = NULL, n.seg = 10, verbose = TRUE )
y |
Vector of responses, must be a factor with exactly 2 levels. |
X |
Matrix of predictor values. |
reg |
The regularization parameter, see below. |
prior |
Vector of prior probabilities, one value for each factor level in |
max.dim |
Integer, the maximum number of dimensions to consider. |
selected |
Vector of logicals, indicating a variable selection, see below. |
n.seg |
Integer, the number of cross-validation segments. |
verbose |
Logical, turns on/off output during computations. |
The PLS method, which is part of the pda
method, requires a decision on the
number of dimensions (components) to use. This is usually found by cross-validation, trying out many different
dimensions and searching for the best performance, e.g. the classification accuracy.
In this algorithm a search for maximum accuracy is also conducted, but then the smallest dimension giving
an accuracy not significantly poorer than the maximum is used. The latter is based on the
McNemar-test, comparing the classifications of the maximum to that of the reduced model.
See mcnemar.test
for details. This procedure is inspired by the CVANOVA idea of Indahl & Naes (1998).
This implementation splits data into cross-validation segments, fits a pda
model
and classifies. The accuracy for each dimension is the fraction of correctly classified elements
after the cross-validation.
The McNemar-test is used to determine if a simpler model is not significantly poorer than the more complex
giving the maximum accuracy. The argument reg
is the rejection level of this test, i.e. using
a reg
value close to 0.0 means a harder regularization and a smaller dimension is in general selected.
Setting it to 1.0 means the dimension with the maximum accuracy (no regularization) is selected.
The argument selected
is used for variable selection, and just passed on to pda
.
A list with the elements Dimension
and Corrects
. Dimension
is the
number of dimensions selected by this algorithm (integer). Corrects
is a matrix of logicals
indicating which elements of y
are correctly classified (TRUE
) for each
dimension 1,...,max.dim
.
Lars Snipen.
Indahl, UG, Naes, T (1998). Evaluation of alternative spectral feature extraction methods of textural images for multivariate modeling. J. Chemometrics, 12:261-278.
eliminator
, mpda
.
data(microbiome) y <- microbiome[1:40, 1] X <- as.matrix(microbiome[1:40, -1]) lst <- pdaDim(y, X, reg = 0.1, prior = c(0.5,0.5), max.dim = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.