LdaPP: Robust Linear Discriminant Analysis by Projection Pursuit
In rrcov: Scalable Robust Estimators with High Breakdown Point

LdaPP

R Documentation

Robust Linear Discriminant Analysis by Projection Pursuit

Description

Performs robust linear discriminant analysis by the projection-pursuit approach - proposed by Pires and Branco (2010) - and returns the results as an object of class LdaPP (aka constructor).

Usage

LdaPP(x, ...)
## S3 method for class 'formula'
LdaPP(formula, data, subset, na.action, ...)
## Default S3 method:
LdaPP(x, grouping, prior = proportions, tol = 1.0e-4,
                 method = c("huber", "mad", "sest", "class"),
                 optim = FALSE,
                 trace=FALSE, ...)

Arguments

`formula`	a formula of the form `y~x`, it describes the response and the predictors. The formula can be more complicated, such as `y~log(x)+z` etc (see `formula` for more details). The response should be a factor representing the response variable, or any vector that can be coerced to such (such as a logical variable).
`data`	an optional data frame (or similar: see `model.frame`) containing the variables in the formula `formula`.
`subset`	an optional vector used to select rows (observations) of the data matrix `x`.
`na.action`	a function which indicates what should happen when the data contain `NA`s. The default is set by the `na.action` setting of `options`, and is `na.fail` if that is unset. The default is `na.omit`.
`x`	a matrix or data frame containing the explanatory variables (training set).
`grouping`	grouping variable: a factor specifying the class for each observation.
`prior`	prior probabilities, default to the class proportions for the training set.
`tol`	tolerance
`method`	method
`optim`	wheather to perform the approximation using the Nelder and Mead simplex method (see function `optim()` from package `stats`). Default is `optim = FALSE`
`trace`	whether to print intermediate results. Default is `trace = FALSE`.
`...`	arguments passed to or from other methods.

Details

Currently the algorithm is implemented only for binary classification and in the following will be assumed that only two groups are present.

The PP algorithm searches for low-dimensional projections of higher-dimensional data where a projection index is maximized. Similar to the original Fisher's proposal the squared standardized distance between the observations in the two groups is maximized. Instead of the sample univariate mean and standard deviation (T,S) robust alternatives are used. These are selected through the argument method and can be one of

huber: the pair (T,S) are the robust M-estimates of location and scale
mad: (T,S) are the Median and the Median Absolute Deviation
sest: the pair (T,S) are the robust S-estimates of location and scale
class: (T,S) are the mean and the standard deviation.

The first approximation A1 to the solution is obtained by investigating a finite number of candidate directions, the unit vectors defined by all pairs of points such that one belongs to the first group and the other to the second group. The found solution is stored in the slots raw.ldf and raw.ldfconst.

The second approximation A2 (optional) is performed by a numerical optimization algorithm using A1 as initial solution. The Nelder and Mead method implemented in the function optim is applied. Whether this refinement will be used is controlled by the argument optim. If optim=TRUE the result of the optimization is stored into the slots ldf and ldfconst. Otherwise these slots are set equal to raw.ldf and raw.ldfconst.

Value

Returns an S4 object of class LdaPP-class

Warning

Still an experimental version! Only binary classification is supported.

Author(s)

Valentin Todorov valentin.todorov@chello.at and Ana Pires apires@math.ist.utl.pt

References

Pires, A. M. and A. Branco, J. (2010) Projection-pursuit approach to robust linear discriminant analysis Journal Multivariate Analysis, Academic Press, Inc., 101, 2464–2485.

Examples


##
## Function to plot a LDA separation line
##
lda.line <- function(lda, ...)
{
    ab <- lda@ldf[1,] - lda@ldf[2,]
    cc <- lda@ldfconst[1] - lda@ldfconst[2]
    abline(a=-cc/ab[2], b=-ab[1]/ab[2],...)
}

data(pottery)
x <- pottery[,c("MG", "CA")]
grp <- pottery$origin
col <- c(3,4)
gcol <- ifelse(grp == "Attic", col[1], col[2])
gpch <- ifelse(grp == "Attic", 16, 1)

##
## Reproduce Fig. 2. from Pires and branco (2010)
##
plot(CA~MG, data=pottery, col=gcol, pch=gpch)

## Not run: 

ppc <- LdaPP(x, grp, method="class", optim=TRUE)
lda.line(ppc, col=1, lwd=2, lty=1)

pph <- LdaPP(x, grp, method="huber",optim=TRUE)
lda.line(pph, col=3, lty=3)

pps <- LdaPP(x, grp, method="sest", optim=TRUE)
lda.line(pps, col=4, lty=4)

ppm <- LdaPP(x, grp, method="mad", optim=TRUE)
lda.line(ppm, col=5, lty=5)

rlda <- Linda(x, grp, method="mcd")
lda.line(rlda, col=6, lty=1)

fsa <- Linda(x, grp, method="fsa")
lda.line(fsa, col=8, lty=6)

## Use the formula interface:
##
LdaPP(origin~MG+CA, data=pottery)       ## use the same two predictors
LdaPP(origin~., data=pottery)           ## use all predictor variables

##
## Predict method
data(pottery)
fit <- LdaPP(origin~., data = pottery)
predict(fit)

## End(Not run)

rrcov documentation built on June 8, 2025, 12:38 p.m.