pda: Pattern recognition with PLS+LDA
In larssnip/mpda: Classification in a multivariate setting

pda	R Documentation

Pattern recognition with PLS+LDA

Description

A classification method that uses first PLS for dimension-reduction and then LDA in the truncated score-space.

Usage

pda(y, X, prior = NULL, max.dim = NULL, selected = NULL)

Arguments

`y`	Vector of responses, must be a factor with exactly 2 levels. See `mpda` for multi-level problems.
`X`	Numeric matrix of predictor values.
`prior`	Vector of prior probabilities, one value for each factor level in `y`.
`max.dim`	Integer, the maximum number of dimensions to consider in PLS.
`selected`	Vector of logicals, indicating a variable selection, see below.

Details

This classification method is designed for highly multivariate problems, i.e. where the predictor matrix X has many and/or highly correlated columns (variables).

First, the response factor is dummy-coded as 0's and 1's. This vector is then used together with X to fit a PLS-model using the oscorespls algorithm, see the plsr for details. The idea is that PLS will find linear combinations, denoted PLS-components, of the original variables to be as an orthogonal basis for spanning the predictor space in such a way that objects from the two factor levels are separated as much as possible. The score-matrix from this step are the original data objects transformed into this subspace.

Next, the score-matrix is truncated, i.e. only max.dim dimensions are used. The PLS-components are all ordered such that the first component has the largest linear discriminative power. Thus, only a small subspace is usually needed for separating between the two classes. This truncated score-matrix is used as the predictor-matrix in LDA. One LDA-model is fitted for each dimension 1,...,max.dim, see lda for details.

The predictor matrix X is centered, but not scaled. If you want scaled variables you need to do this (with scale) before you call pda.

The argument selected may be used to select a subset of the predictor variables (columns of X), e.g. after a variable selection (see eliminator). This must be a vector of logicals (TRUE/FALSE) indicating the selected variables, and the reduced predictor matrix becomes X[,which(selected)]. The main reason for this option is the use of pda in mpda.

Value

A pda object, which is a list with elements PLS, LDA, Response and Selected. The element PLS is simply the object returned from plsr. The element LDA is a list with the fitted lda objects for each dimension. The elements Response and Selected are copies of the arguments y and selected.

Author(s)

Lars Snipen.

Examples

data(microbiome)
y <- microbiome[1:40, 1]
X <- as.matrix(microbiome[1:40, -1])
m.trn <- pda(y, X, prior = c(0.5,0.5), max.dim = 10)

data(poems)
y <- factor(poems[11:28,1], levels = c("Blake","Eliot"))
X <- as.matrix(poems[11:28, -1])
selection <- rep(FALSE, ncol(X))
selection[c(1,5,9,15,21)] <- TRUE   # using letters a, e, i, o and u only
p.trn <- pda(y, X, prior = c(1,1), selected = selection)

larssnip/mpda documentation built on March 28, 2022, 3:37 p.m.