rlda: Restricted Linear Discriminant Analysis

View source: R/rlda.R

rldaR Documentation

Restricted Linear Discriminant Analysis

Description

Build linear classification rules with additional information expressed as inequality restrictions among the populations means.

Usage

rlda(x, ...)

## S3 method for class 'matrix'
rlda(x, ...)

## S3 method for class 'data.frame'
rlda(x, grouping, ...)

## S3 method for class 'formula'
rlda(formula, data, ...)

## Default S3 method:
rlda(x, grouping, subset = NULL, resmatrix = NULL, restext = NULL,
     gamma = c(0, 1), prior = NULL, ...)

Arguments

formula

A formula of the form groups ~ x1 + x2 + .... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.

data

Data frame from which variables specified in formula are to be taken.

x

(Required if no formula is given as the principal argument.) A data frame or matrix containing the explanatory variables.

grouping

(Required if no formula is given as the principal argument.) A numeric vector or factor with numeric levels specifying the class for each observation.

subset

An index vector specifying the cases to be used in the training sample.

resmatrix

A matrix specifying the linear restrictions on the mean vectors: resmatrix %*% mu <= 0, where mu = c(mu1, mu2, ...) and mui is the mean vector of class i. If unspecified, restext will be required (and resmatrix established accordingly).

restext

(Required if no resmatrix argument is given.) A character string from which resmatrix will be calculated. The first element must be either "s" (simple order) or "t" (tree order: mu1 >= mu2, mu1 >= mu3 ...). The second element must be either "<" (increasing componentwise order) or ">" (decreasing componentwise order). The rest of the elements must be numbers from 1 to the number of explanatory variables, separated by commas, specifying among which variables the restrictions hold. For example, "s<1,3" will stand for mu11 <= mu21 <= mu31 <= ..., mu13 <= mu23 <= mu33 <= ...

gamma

A vector of values in the unit interval that determine the classification rules with additional information (see references).

prior

The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities must be specified in the order of the factor levels.

...

Arguments passed to or from other methods.

Details

Specifying the prior will affect the classification and error unless over-ridden in predict.rlda and err.est.rlda, respectively.

Value

An object of class 'rlda' containing the following components:

call

The (matched) function call.

trainset

Matrix with the training set used (first columns) and the class for each observation (last column).

restrictions

Edited character string with the linear restrictions on the mean vectors detailed.

resmatrix

The matrix with the restrictions on the mean vectors used.

prior

Prior probabilities of class membership used.

counts

The number of observations of the classes used.

N

The total number of observations used.

samplemeans

Matrix with the sample means in rows.

samplevariances

Array with the sample covariance matrices of the classes.

gamma

Gamma values used.

spooled

Pooled covariance matrix.

estimatedmeans

Array with the estimated means for each classification rule.

apparent

Apparent error rate for each classification rule.

Note

This function may be called giving either a formula and data frame, or a data frame and grouping factor, or a matrix and grouping factor as the first two arguments. All other arguments are optional.

Classes must be identified, either in a column of data or in the grouping vector, by natural numbers varying from 1 to the number of classes. The number of classes must be greater than 1.

If there are missing values in either data, x or grouping, corresponding observations will be deleted.

To overcome singularity of the covariance matrices, the number of observations in each class must be greater or equal than the number of explanatory variables.

Author(s)

David Conde

References

Conde, D., Fernandez, M. A., Rueda, C., and Salvador, B. (2012). Classification of samples into two or more ordered populations with application to a cancer trial. Statistics in Medicine, 31, 3773-3786.

Conde, D., Fernandez, M. A., Salvador, B., and Rueda, C. (2015). dawai: An R Package for Discriminant Analysis with Additional Information. Journal of Statistical Software, 66(10), 1-19. URL http://www.jstatsoft.org/v66/i10/.

Fernandez, M. A., Rueda, C., Salvador, B. (2006). Incorporating additional information to normal linear discriminant rules. Journal of the American Statistical Association, 101, 569-577.

See Also

predict.rlda, err.est.rlda, rqda, predict.rqda, err.est.rqda

Examples

data(Vehicle2)
levels(Vehicle2$Class)
## "bus" "opel" "saab" "van"

data <- Vehicle2
levels(data$Class) <- c(4, 2, 1, 3)  
## classes ordered by increasing size
## 
## according to variable definitions, we can 
## consider the following restrictions on the means vectors:
## mu11, mu21 >= mu31 >= mu41
## mu12, mu22 >= mu32 >= mu42
## 
## we have 6 restrictions, 3 predictors and 4 classes, so
## resmatrix must be a 6 x 12 matrix:

A <- matrix(0, ncol = 12, nrow = 6)
A[t(matrix(c(1, 1, 2, 2, 3, 4, 4, 5, 5, 7, 6, 8), nrow = 2))] <- -1
A[t(matrix(c(1, 7, 2, 8, 3, 7, 4, 8, 5, 10, 6, 11), nrow = 2))] <- 1

set.seed(983)
values <- runif(dim(data)[1])
trainsubset <- values < 0.2
obj <- rlda(Class ~ Kurt.Maxis + Holl.Ra + Sc.Var.maxis,
            data, subset = trainsubset, gamma = c(0, 0.5, 1),
            resmatrix = A)
obj
## we can see that the apparent error rate of the restricted
## rules decrease with gamma:
##  gamma=0 gamma=0.5   gamma=1
## 42.30769  41.66667  41.02564

dawai documentation built on Oct. 15, 2024, 5:06 p.m.