wlda: Weighted Linear Discriminant Analysis
In locClass: Collection of Local Classification Methods

Description Usage Arguments Details Value See Also Examples

A version of Linear Discriminant Analysis that can deal with observation weights.

  wlda(x, ...)

  ## S3 method for class 'formula'
 wlda(formula, data,
    weights = rep(1, nrow(data)), ..., subset, na.action)

  ## S3 method for class 'data.frame'
 wlda(x, ...)

  ## S3 method for class 'matrix'
 wlda(x, grouping,
    weights = rep(1, nrow(x)), ..., subset,
    na.action = na.fail)

  ## Default S3 method:
 wlda(x, grouping,
    weights = rep(1, nrow(x)),
    method = c("unbiased", "ML"), ...)

`formula`	A `formula` of the form `groups ~ x1 + x2 + ...`, that is, the response is the grouping `factor` and the right hand side specifies the (non-`factor`) discriminators.
`data`	A `data.frame` from which variables specified in `formula` are to be taken.
`x`	(Required if no `formula` is given as principal argument.) A `matrix` or `data.frame` or `Matrix` containing the explanatory variables.
`grouping`	(Required if no `formula` is given as principal argument.) A `factor` specifying the class membership for each observation.
`weights`	Observation weights to be used in the fitting process, must be larger or equal to zero.
`method`	Method for scaling the pooled weighted covariance matrix, either `"unbiased"` or maximum-likelihood (`"ML"`). Defaults to `"unbiased"`.
`...`	Further arguments.
`subset`	An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)
`na.action`	A function to specify the action to be taken if NAs are found. The default action is first the `na.action` setting of `options` and second `na.fail` if that is unset. An alternative is `na.omit`, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.)

The formulas for the weighted estimates of the class means, the covariance matrix and the class priors are as follows:

Normalized weights: if x_n is in class g, i. e. y_n = g

w_n* = w_n/sum_{n:y_n=g} w_n

Weighted class means:

bar x_g = sum_{n:y_n=g} w_n* x_i

Pooled weighted class covariance matrix:

S_g = sum_{n:y_n=g} w_n* (x_n - bar x_g)(x_n - bar x_g)'

method = "ML":

S = sum_g p_g S_g

method = "unbiased":

S = sum_g p_g S_g/(1 - sum_g p_g sum_{n:y_n=g} w_n*^2)

Weighted prior probabilities:

p_g = ∑_{n:y_n=g} w_n/∑_n w_n

If the predictor variables include factors, the formula interface must be used in order to get a correct model matrix.

An object of class "wlda", a list containing the following components:

`prior`	Weighted class prior probabilities.
`counts`	The number of observations per class.
`means`	Weighted estimates of class means.
`cov`	Weighted estimate of the pooled class covariance matrix.
`lev`	The class labels (levels of `grouping`).
`N`	The number of observations.
`weights`	The observation weights used in the fitting process.
`method`	The method used for scaling the pooled weighted covariance matrix.
`call`	The (matched) function call.

predict.wlda and dalda which is based on wlda.

library(mlbench)
data(PimaIndiansDiabetes)

train <- sample(nrow(PimaIndiansDiabetes), 500)

# weighting observations from classes pos and neg according to their
# frequency in the data set:
ws <- as.numeric(1/table(PimaIndiansDiabetes$diabetes)
    [PimaIndiansDiabetes$diabetes])

fit <- wlda(diabetes ~ ., data = PimaIndiansDiabetes, weights = ws,
    subset = train)
pred <- predict(fit, newdata = PimaIndiansDiabetes[-train,])
mean(pred$class != PimaIndiansDiabetes$diabetes[-train])