loclda: Localized Linear Discriminant Analysis (LocLDA) In klaR: Classification and Visualization

Description

A localized version of Linear Discriminant Analysis.

Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14``` ```loclda(x, ...) ## S3 method for class 'formula' loclda(formula, data, ..., subset, na.action) ## Default S3 method: loclda(x, grouping, weight.func = function(x) 1/exp(x), k = nrow(x), weighted.apriori = TRUE, ...) ## S3 method for class 'data.frame' loclda(x, ...) ## S3 method for class 'matrix' loclda(x, grouping, ..., subset, na.action) ```

Arguments

 `formula` Formula of the form ‘`groups ~ x1 + x2 + ...`’. `data` Data frame from which variables specified in `formula` are to be taken. `x` Matrix or data frame containing the explanatory variables (required, if `formula` is not given). `grouping` (required if no `formula` principal argument is given.) A factor specifying the class for each observation. `weight.func` Function used to compute local weights. Must be finite over the interval [0,1]. See Details below. `k` Number of nearest neighbours used to construct localized classification rules. See Details below. `weighted.apriori` Logical: if `TRUE`, class prior probabilities are computed using local weights (see Details below). If `FALSE`, equal priors for all classes actually occurring in the train data are used. `subset` An index vector specifying the cases to be used in the training sample. `na.action` A function to specify the action to be taken if `NA`s are found. The default action is for the procedure to fail. An alternative is `na.omit` which leads to rejection of cases with missing values on any required variable. `...` Further arguments to be passed to `loclda.default`.

Details

This is an approach to apply the concept of localization described by Tutz and Binder (2005) to Linear Discriminant Analysis. The function `loclda` generates an object of class `loclda` (see Value below). As localization makes it necessary to build an individual decision rule for each test observation, this rule construction has to be handled by `predict.loclda`. For convenience, the rule building procedure is still described here.

To classify a test observation x_s, only the `k` nearest neighbours of x_s within the train data are used. Each of these k train observations x_i, i=1,...,k, is assigned a weight w_i according to

w_i := K ( ||x_i - x_s|| / d_k ), i=1,...,k,

where K is the weighting function given by `weight.func`, ||x_i - x_s|| is the euclidian distance of x_i and x_s and d_k is the euclidian distance of x_s to its k-th nearest neighbour. With these weights for each class A_g, g=1,...,G, its weighted empirical mean mu_g_hat and weighted empirical covariance matrix are computed. The estimated pooled (weighted) covariance matrix Sigma_hat is then calculated from the individual weighted empirical class covariance matrices. If `weighted.apriori` is `TRUE` (the default), prior class probabilities are estimated according to:

prior_g := [ Sum_{i=1,..,k} ( w_i * I(x_i in A_g) ) ] / [ Sum_{i=1,...,k} ( w_i ) ], g = 1,...,G,

where I is the indicator function. If `FALSE`, equal priors for all classes are used. In analogy to Linear Discriminant Analysis, the decision rule for x_s is

A_hat := argmax_{g in 1,...,G} (posterior_g),

where

posterior_g := prior_g * exp [ (-1/2) * t( x_s - mu_g_hat ) * Sigma_hat^(-1) * ( x_s - mu_g_hat ) ] .

If posterior_g < 1e-150 for all g in 1,...,G, posterior_g is set to 1/G for all g in 1,...,G and the test observation x_s is simply assigned to the class whose weighted mean has the lowest euclidian distance to x_s.

Value

A list of class `loclda` containing the following components:

 `call` The (matched) function call. `learn` Matrix containing the values of the explanatory variables for all train observations. `grouping` Factor specifying the class for each train observation. `weight.func` Value of the argument `weight.func`. `k` Value of the argument `k`. `weighted.apriori` Value of the argument `weighted.apriori`.

Author(s)

Marc Zentgraf ([email protected]) and Karsten Luebke ([email protected])

References

Tutz, G. and Binder, H. (2005): Localized classification. Statistics and Computing 15, 155-166.

`predict.loclda`, `lda`

Examples

 ```1 2``` ```benchB3("lda")\$l1co.error benchB3("loclda")\$l1co.error ```

Example output

```Loading required package: MASS

Error Rate in 1 th cycle:  0.667
Error Rate in 2 th cycle:  0.438
Error Rate in 3 th cycle:  0.294
Error Rate in 4 th cycle:  0.667
Error Rate in 5 th cycle:  0.344
Error Rate in 6 th cycle:  0.562
------------------------------------------
Mean Error Rate of method lda : 0.495
[1] 0.4952002

Error Rate in 1 th cycle:  0.667
Error Rate in 2 th cycle:  0.438
Error Rate in 3 th cycle:  0.118
Error Rate in 4 th cycle:  0.583
Error Rate in 5 th cycle:  0.281
Error Rate in 6 th cycle:  0.542
------------------------------------------
Mean Error Rate of method loclda : 0.438
[1] 0.4380106
```

klaR documentation built on March 19, 2018, 5:03 p.m.