pda: Pattern recognition with PLS+LDA

View source: R/pda.R

pdaR Documentation

Pattern recognition with PLS+LDA

Description

A classification method that uses first PLS for dimension-reduction and then LDA in the truncated score-space.

Usage

pda(y, X, prior = NULL, max.dim = NULL, selected = NULL)

Arguments

y

Vector of responses, must be a factor with exactly 2 levels. See mpda for multi-level problems.

X

Numeric matrix of predictor values.

prior

Vector of prior probabilities, one value for each factor level in y.

max.dim

Integer, the maximum number of dimensions to consider in PLS.

selected

Vector of logicals, indicating a variable selection, see below.

Details

This classification method is designed for highly multivariate problems, i.e. where the predictor matrix X has many and/or highly correlated columns (variables).

First, the response factor is dummy-coded as 0's and 1's. This vector is then used together with X to fit a PLS-model using the oscorespls algorithm, see the plsr for details. The idea is that PLS will find linear combinations, denoted PLS-components, of the original variables to be as an orthogonal basis for spanning the predictor space in such a way that objects from the two factor levels are separated as much as possible. The score-matrix from this step are the original data objects transformed into this subspace.

Next, the score-matrix is truncated, i.e. only max.dim dimensions are used. The PLS-components are all ordered such that the first component has the largest linear discriminative power. Thus, only a small subspace is usually needed for separating between the two classes. This truncated score-matrix is used as the predictor-matrix in LDA. One LDA-model is fitted for each dimension 1,...,max.dim, see lda for details.

The predictor matrix X is centered, but not scaled. If you want scaled variables you need to do this (with scale) before you call pda.

The argument selected may be used to select a subset of the predictor variables (columns of X), e.g. after a variable selection (see eliminator). This must be a vector of logicals (TRUE/FALSE) indicating the selected variables, and the reduced predictor matrix becomes X[,which(selected)]. The main reason for this option is the use of pda in mpda.

Value

A pda object, which is a list with elements PLS, LDA, Response and Selected. The element PLS is simply the object returned from plsr. The element LDA is a list with the fitted lda objects for each dimension. The elements Response and Selected are copies of the arguments y and selected.

Author(s)

Lars Snipen.

See Also

predict.pda, pdaDim.

Examples

data(microbiome)
y <- microbiome[1:40, 1]
X <- as.matrix(microbiome[1:40, -1])
m.trn <- pda(y, X, prior = c(0.5,0.5), max.dim = 10)

data(poems)
y <- factor(poems[11:28,1], levels = c("Blake","Eliot"))
X <- as.matrix(poems[11:28, -1])
selection <- rep(FALSE, ncol(X))
selection[c(1,5,9,15,21)] <- TRUE   # using letters a, e, i, o and u only
p.trn <- pda(y, X, prior = c(1,1), selected = selection)


larssnip/mpda documentation built on March 28, 2022, 3:37 p.m.