# LdaPP: Robust Linear Discriminant Analysis by Projection Pursuit In rrcov: Scalable Robust Estimators with High Breakdown Point

## Description

Performs robust linear discriminant analysis by the projection-pursuit approach - proposed by Pires and Branco (2010) - and returns the results as an object of class `LdaPP` (aka constructor).

## Usage

 ```1 2 3 4 5 6 7 8``` ```LdaPP(x, ...) ## S3 method for class 'formula' LdaPP(formula, data, subset, na.action, ...) ## Default S3 method: LdaPP(x, grouping, prior = proportions, tol = 1.0e-4, method = c("huber", "mad", "sest", "class"), optim = FALSE, trace=FALSE, ...) ```

## Arguments

 `formula` a formula of the form `y~x`, it describes the response and the predictors. The formula can be more complicated, such as `y~log(x)+z` etc (see `formula` for more details). The response should be a factor representing the response variable, or any vector that can be coerced to such (such as a logical variable). `data` an optional data frame (or similar: see `model.frame`) containing the variables in the formula `formula`. `subset` an optional vector used to select rows (observations) of the data matrix `x`. `na.action` a function which indicates what should happen when the data contain `NA`s. The default is set by the `na.action` setting of `options`, and is `na.fail` if that is unset. The default is `na.omit`. `x` a matrix or data frame containing the explanatory variables (training set). `grouping` grouping variable: a factor specifying the class for each observation. `prior` prior probabilities, default to the class proportions for the training set. `tol` tolerance `method` method `optim` wheather to perform the approximation using the Nelder and Mead simplex method (see function `optim()` from package `stats`). Default is `optim = FALSE` `trace` whether to print intermediate results. Default is `trace = FALSE`. `...` arguments passed to or from other methods.

## Details

Currently the algorithm is implemented only for binary classification and in the following will be assumed that only two groups are present.

The PP algorithm searches for low-dimensional projections of higher-dimensional data where a projection index is maximized. Similar to the original Fisher's proposal the squared standardized distance between the observations in the two groups is maximized. Instead of the sample univariate mean and standard deviation `(T,S)` robust alternatives are used. These are selected through the argument `method` and can be one of

huber

the pair `(T,S)` are the robust M-estimates of location and scale

`(T,S)` are the Median and the Median Absolute Deviation

sest

the pair `(T,S)` are the robust S-estimates of location and scale

class

`(T,S)` are the mean and the standard deviation.

The first approximation A1 to the solution is obtained by investigating a finite number of candidate directions, the unit vectors defined by all pairs of points such that one belongs to the first group and the other to the second group. The found solution is stored in the slots `raw.ldf` and `raw.ldfconst`.

The second approximation A2 (optional) is performed by a numerical optimization algorithm using A1 as initial solution. The Nelder and Mead method implemented in the function `optim` is applied. Whether this refinement will be used is controlled by the argument `optim`. If `optim=TRUE` the result of the optimization is stored into the slots `ldf` and `ldfconst`. Otherwise these slots are set equal to `raw.ldf` and `raw.ldfconst`.

## Value

Returns an S4 object of class `LdaPP-class`

## Warning

Still an experimental version! Only binary classification is supported.

## Author(s)

Valentin Todorov [email protected] and Ana Pires [email protected]

## References

Pires, A. M. and A. Branco, J. (2010) Projection-pursuit approach to robust linear discriminant analysis Journal Multivariate Analysis, Academic Press, Inc., 101, 2464–2485.

`Linda`, `LdaClassic`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50``` ```## ## Function to plot a LDA separation line ## lda.line <- function(lda, ...) { ab <- lda@ldf[1,] - lda@ldf[2,] cc <- lda@ldfconst[1] - lda@ldfconst[2] abline(a=-cc/ab[2], b=-ab[1]/ab[2],...) } data(pottery) x <- pottery[,c("MG", "CA")] grp <- pottery\$origin col <- c(3,4) gcol <- ifelse(grp == "Attic", col[1], col[2]) gpch <- ifelse(grp == "Attic", 16, 1) ## ## Reproduce Fig. 2. from Pires and branco (2010) ## plot(CA~MG, data=pottery, col=gcol, pch=gpch) ppc <- LdaPP(x, grp, method="class", optim=TRUE) lda.line(ppc, col=1, lwd=2, lty=1) pph <- LdaPP(x, grp, method="huber",optim=TRUE) lda.line(pph, col=3, lty=3) pps <- LdaPP(x, grp, method="sest", optim=TRUE) lda.line(pps, col=4, lty=4) ppm <- LdaPP(x, grp, method="mad", optim=TRUE) lda.line(ppm, col=5, lty=5) rlda <- Linda(x, grp, method="mcd") lda.line(rlda, col=6, lty=1) fsa <- Linda(x, grp, method="fsa") lda.line(fsa, col=8, lty=6) ## Use the formula interface: ## LdaPP(origin~MG+CA, data=pottery) ## use the same two predictors LdaPP(origin~., data=pottery) ## use all predictor variables ## ## Predict method data(pottery) fit <- LdaPP(origin~., data = pottery) predict(fit) ```