Robust Linear Discriminant Analysis by Projection Pursuit

Description

Performs robust linear discriminant analysis by the projection-pursuit approach - proposed by Pires and Branco (2010) - and returns the results as an object of class LdaPP (aka constructor).

Usage

1
2
3
4
5
6
7
8
LdaPP(x, ...)
## S3 method for class 'formula'
LdaPP(formula, data, subset, na.action, ...)
## Default S3 method:
LdaPP(x, grouping, prior = proportions, tol = 1.0e-4,
                 method = c("huber", "mad", "sest", "class"),
                 optim = FALSE,
                 trace=FALSE, ...)

Arguments

formula

a formula of the form y~x, it describes the response and the predictors. The formula can be more complicated, such as y~log(x)+z etc (see formula for more details). The response should be a factor representing the response variable, or any vector that can be coerced to such (such as a logical variable).

data

an optional data frame (or similar: see model.frame) containing the variables in the formula formula.

subset

an optional vector used to select rows (observations) of the data matrix x.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The default is na.omit.

x

a matrix or data frame containing the explanatory variables (training set).

grouping

grouping variable: a factor specifying the class for each observation.

prior

prior probabilities, default to the class proportions for the training set.

tol

tolerance

method

method

optim

wheather to perform the approximation using the Nelder and Mead simplex method (see function optim() from package stats). Default is optim = FALSE

trace

whether to print intermediate results. Default is trace = FALSE.

...

arguments passed to or from other methods.

Details

Currently the algorithm is implemented only for binary classification and in the following will be assumed that only two groups are present.

The PP algorithm searches for low-dimensional projections of higher-dimensional data where a projection index is maximized. Similar to the original Fisher's proposal the squared standardized distance between the observations in the two groups is maximized. Instead of the sample univariate mean and standard deviation (T,S) robust alternatives are used. These are selected through the argument method and can be one of

huber

the pair (T,S) are the robust M-estimates of location and scale

mad

(T,S) are the Median and the Median Absolute Deviation

sest

the pair (T,S) are the robust S-estimates of location and scale

class

(T,S) are the mean and the standard deviation.

The first approximation A1 to the solution is obtained by investigating a finite number of candidate directions, the unit vectors defined by all pairs of points such that one belongs to the first group and the other to the second group. The found solution is stored in the slots raw.ldf and raw.ldfconst.

The second approximation A2 (optional) is performed by a numerical optimization algorithm using A1 as initial solution. The Nelder and Mead method implemented in the function optim is applied. Whether this refinement will be used is controlled by the argument optim. If optim=TRUE the result of the optimization is stored into the slots ldf and ldfconst. Otherwise these slots are set equal to raw.ldf and raw.ldfconst.

Value

Returns an S4 object of class LdaPP-class

Warning

Still an experimental version! Only binary classification is supported.

Author(s)

Valentin Todorov valentin.todorov@chello.at and Ana Pires apires@math.ist.utl.pt

References

Pires, A. M. and A. Branco, J. (2010) Projection-pursuit approach to robust linear discriminant analysis Journal Multivariate Analysis, Academic Press, Inc., 101, 2464–2485.

See Also

Linda, LdaClassic

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
##
## Function to plot a LDA separation line
##
lda.line <- function(lda, ...)
{
    ab <- lda@ldf[1,] - lda@ldf[2,]
    cc <- lda@ldfconst[1] - lda@ldfconst[2]
    abline(a=-cc/ab[2], b=-ab[1]/ab[2],...)
}

data(pottery)
x <- pottery[,c("MG", "CA")]
grp <- pottery$origin
col <- c(3,4)
gcol <- ifelse(grp == "Attic", col[1], col[2])
gpch <- ifelse(grp == "Attic", 16, 1)

##
## Reproduce Fig. 2. from Pires and branco (2010)
##
plot(CA~MG, data=pottery, col=gcol, pch=gpch)

ppc <- LdaPP(x, grp, method="class", optim=TRUE)
lda.line(ppc, col=1, lwd=2, lty=1)

pph <- LdaPP(x, grp, method="huber",optim=TRUE)
lda.line(pph, col=3, lty=3)

pps <- LdaPP(x, grp, method="sest", optim=TRUE)
lda.line(pps, col=4, lty=4)

ppm <- LdaPP(x, grp, method="mad", optim=TRUE)
lda.line(ppm, col=5, lty=5)

rlda <- Linda(x, grp, method="mcd")
lda.line(rlda, col=6, lty=1)

fsa <- Linda(x, grp, method="fsa")
lda.line(fsa, col=8, lty=6)

## Use the formula interface:
##
LdaPP(origin~MG+CA, data=pottery)       ## use the same two predictors
LdaPP(origin~., data=pottery)           ## use all predictor variables

##
## Predict method
data(pottery)
fit <- LdaPP(origin~., data = pottery)
predict(fit)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.