| pda | R Documentation |
A classification method that uses first PLS for dimension-reduction and then LDA in the truncated score-space.
pda(y, X, prior = NULL, max.dim = NULL, selected = NULL)
y |
Vector of responses, must be a factor with exactly 2 levels. See |
X |
Numeric matrix of predictor values. |
prior |
Vector of prior probabilities, one value for each factor level in |
max.dim |
Integer, the maximum number of dimensions to consider in PLS. |
selected |
Vector of logicals, indicating a variable selection, see below. |
This classification method is designed for highly multivariate problems, i.e. where the
predictor matrix X has many and/or highly correlated columns (variables).
First, the response factor is dummy-coded as 0's and 1's. This vector is then used together
with X to fit a PLS-model using the oscorespls algorithm, see the plsr
for details. The idea is that PLS will find linear combinations, denoted PLS-components, of the
original variables to be as an orthogonal basis for spanning the predictor space in such a way that objects
from the two factor levels are separated as much as possible. The score-matrix from this step are the original
data objects transformed into this subspace.
Next, the score-matrix is truncated, i.e. only max.dim dimensions are used. The PLS-components
are all ordered such that the first component has the largest linear discriminative power. Thus, only a small
subspace is usually needed for separating between the two classes. This truncated score-matrix is used
as the predictor-matrix in LDA. One LDA-model is fitted for each dimension 1,...,max.dim,
see lda for details.
The predictor matrix X is centered, but not scaled. If you want scaled variables you need to do this
(with scale) before you call pda.
The argument selected may be used to select a subset of the predictor variables (columns of X),
e.g. after a variable selection (see eliminator). This must be a
vector of logicals (TRUE/FALSE) indicating the selected variables, and the reduced predictor
matrix becomes X[,which(selected)]. The main reason for this option is the use of pda
in mpda.
A pda object, which is a list with elements PLS, LDA, Response and
Selected. The
element PLS is simply the object returned from plsr. The element LDA is a
list with the fitted lda objects for each dimension. The elements Response and Selected
are copies of the arguments y and selected.
Lars Snipen.
predict.pda, pdaDim.
data(microbiome)
y <- microbiome[1:40, 1]
X <- as.matrix(microbiome[1:40, -1])
m.trn <- pda(y, X, prior = c(0.5,0.5), max.dim = 10)
data(poems)
y <- factor(poems[11:28,1], levels = c("Blake","Eliot"))
X <- as.matrix(poems[11:28, -1])
selection <- rep(FALSE, ncol(X))
selection[c(1,5,9,15,21)] <- TRUE # using letters a, e, i, o and u only
p.trn <- pda(y, X, prior = c(1,1), selected = selection)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.