dalda: Discriminant Adaptive Linear Discriminant Analysis

Description Usage Arguments Details Value References See Also Examples

View source: R/dalda.R

Description

A local version of Linear Discriminant Analysis that puts increased emphasis on a good model fit near the decision boundary.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
dalda(x, ...)

## S3 method for class 'formula'
dalda(formula, data, weights = rep(1, nrow(data)), ...,
  subset, na.action)

## S3 method for class 'data.frame'
dalda(x, ...)

## S3 method for class 'matrix'
dalda(x, grouping, weights = rep(1, nrow(x)), ..., subset,
  na.action = na.fail)

## Default S3 method:
dalda(x, grouping, wf = c("biweight", "cauchy", "cosine",
  "epanechnikov", "exponential", "gaussian", "optcosine", "rectangular",
  "triangular"), bw, k, nn.only, itr = 3, weights, ...)

Arguments

x

(Required if no formula is given as principal argument.) A matrix or data.frame or Matrix containing the explanatory variables.

formula

A formula of the form groups ~ x1 + x2 + ..., that is, the response is the grouping factor and the right hand side specifies the (normally non-factor) discriminators.

data

A data.frame from which variables specified in formula are to be taken.

weights

Initial observation weights (defaults to a vector of 1s).

subset

An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)

na.action

A function to specify the action to be taken if NAs are found. The default action is first the na.action setting of options and second na.fail if that is unset. An alternative is na.omit, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.)

grouping

(Required if no formula is given as principal argument.) A factor specifying the class membership for each observation.

wf

A window function which is used to calculate weights that are introduced into the fitting process. Either a character string or a function, e.g. wf = function(x) exp(-x). For details see the documentation for wfs.

bw

(Required only if wf is a string.) The bandwidth parameter of the window function. (See wfs.)

k

(Required only if wf is a string.) The number of nearest neighbors of the decision boundary to be used in the fitting process. (See wfs.)

nn.only

(Required only if wf is a string indicating a window function with infinite support and if k is specified.) Should only the k nearest neighbors or all observations receive positive weights? (See wfs.)

itr

Number of iterations for model fitting, defaults to 3. See also the Details section.

...

Further arguments to be passed to wlda.

Details

The idea of Hand and Vinciotti (2003) to put increased weight on observations near the decision boundary is generalized to the multiclass case and applied to Linear Discriminant Analysis (LDA). Since the decision boundary is not known in advance an iterative procedure is required. First, an unweighted LDA is fitted to the data. Based on the differences between the two largest estimated posterior probabilities observation weights are calculated. Then a weighted LDA (see wlda) is fitted using these weights. Calculation of weights and model fitting is done several times in turn. The number of iterations is determined by the itr-argument that defaults to 3.

The name of the window function (wf) can be specified as a character string. In this case the window function is generated internally in dalda. Currently supported are "biweight", "cauchy", "cosine", "epanechnikov", "exponential", "gaussian", "optcosine", "rectangular" and "triangular".

Moreover, it is possible to generate the window functions mentioned above in advance (see wfs) and pass them to dalda.

Any other function implementing a window function can also be used as wf argument. This allows the user to try own window functions. See help on wfs for details.

If the predictor variables include factors, the formula interface must be used in order to get a correct model matrix.

Value

An object of class "dalda" inheriting from "wlda", a list containing the following components:

prior

Weighted class prior probabilities.

counts

The number of observations per class.

means

Weighted estimates of class means.

cov

Weighted estimate of the pooled class covariance matrix.

lev

The class labels (the levels of grouping).

N

The number of training observations.

weights

A list of length itr + 1. The initial observation weights (a vector of 1s if none were given) and the observation weights calculated in the individual iterations. The weights are scaled such that they sum up to 1.

method

The method used for scaling the pooled weighted covariance matrix.

itr

The number of iterations used.

wf

The window function used. Always a function, even if the input was a string.

bw

(Only if wf is a string or was generated by means of one of the functions documented in wfs.) The bandwidth used, NULL if bw was not specified.

k

(Only if wf is a string or was generated by means of one of the functions documented in wfs.) The number of nearest neighbors used, NULL if k was not specified.

nn.only

(Logical. Only if wf is a string or was generated by means of one of the functions documented in wfs and if k was specified.) TRUE if only the k nearest neighbors recieve a positive weight, FALSE otherwise.

adaptive

(Logical.) TRUE if the bandwidth of wf is adaptive to the local density of data points, FALSE if the bandwidth is fixed.

call

The (matched) function call.

References

Hand, D. J., Vinciotti, V. (2003), Local versus global models for classification problems: Fitting models where it matters, The American Statistician, 57(2) 124–130.

See Also

predict.dalda, wlda for a weighted version of Linear Discriminant Analysis and dalr for discriminant adaptive logistic regression.

Examples

1
2
3
4
fit <- dalda(Species ~ Sepal.Length + Sepal.Width, data = iris,
    wf = "gaussian", bw = 0.5)
pred <- predict(fit)
mean(pred$class != iris$Species)

schiffner/locClass documentation built on May 29, 2019, 3:39 p.m.