pda | R Documentation |
A classification method that uses first PLS for dimension-reduction and then LDA in the truncated score-space.
pda(y, X, prior = NULL, max.dim = NULL, selected = NULL)
y |
Vector of responses, must be a factor with exactly 2 levels. See |
X |
Numeric matrix of predictor values. |
prior |
Vector of prior probabilities, one value for each factor level in |
max.dim |
Integer, the maximum number of dimensions to consider in PLS. |
selected |
Vector of logicals, indicating a variable selection, see below. |
This classification method is designed for highly multivariate problems, i.e. where the
predictor matrix X
has many and/or highly correlated columns (variables).
First, the response factor is dummy-coded as 0's and 1's. This vector is then used together
with X
to fit a PLS-model using the oscorespls
algorithm, see the plsr
for details. The idea is that PLS will find linear combinations, denoted PLS-components, of the
original variables to be as an orthogonal basis for spanning the predictor space in such a way that objects
from the two factor levels are separated as much as possible. The score-matrix from this step are the original
data objects transformed into this subspace.
Next, the score-matrix is truncated, i.e. only max.dim
dimensions are used. The PLS-components
are all ordered such that the first component has the largest linear discriminative power. Thus, only a small
subspace is usually needed for separating between the two classes. This truncated score-matrix is used
as the predictor-matrix in LDA. One LDA-model is fitted for each dimension 1,...,max.dim
,
see lda
for details.
The predictor matrix X
is centered, but not scaled. If you want scaled variables you need to do this
(with scale
) before you call pda
.
The argument selected
may be used to select a subset of the predictor variables (columns of X
),
e.g. after a variable selection (see eliminator
). This must be a
vector of logicals (TRUE/FALSE
) indicating the selected variables, and the reduced predictor
matrix becomes X[,which(selected)]
. The main reason for this option is the use of pda
in mpda
.
A pda
object, which is a list with elements PLS
, LDA
, Response
and
Selected
. The
element PLS
is simply the object returned from plsr
. The element LDA
is a
list with the fitted lda
objects for each dimension. The elements Response
and Selected
are copies of the arguments y
and selected
.
Lars Snipen.
predict.pda
, pdaDim
.
data(microbiome) y <- microbiome[1:40, 1] X <- as.matrix(microbiome[1:40, -1]) m.trn <- pda(y, X, prior = c(0.5,0.5), max.dim = 10) data(poems) y <- factor(poems[11:28,1], levels = c("Blake","Eliot")) X <- as.matrix(poems[11:28, -1]) selection <- rep(FALSE, ncol(X)) selection[c(1,5,9,15,21)] <- TRUE # using letters a, e, i, o and u only p.trn <- pda(y, X, prior = c(1,1), selected = selection)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.