dasdod: DA using a SIMCA index

View source: R/dasdod.R

dasdodR Documentation

DA using a SIMCA index

Description

Discriminant analysis using a SIMCA index (for comprehensive reviews, see e.g.: Vanden Branden & Hubert 2005, Daszykowski et al. 2007, Durante et al. 2011).

For each new observation to predict, the function calculates a "SIMCA" index (output index) between the new observation and each of the classes of the reference (= training) data. The final predicted class corresponds to the class for which the index is the lowest. The principle is described below.

A PCA is implemented for each class (= model representing the class). Two distances, referred to as SD and OD, are calculated for each new observation to predict and class. SD is the "score distance", i.e. the Mahalanobis distance between the projection of the new observation in the PCA score space and the center of this space. OD is the "orthogonal distance" (X-residual), i.e. the Euclidean distance between the new observation and its projection on the score space.

The SIMCA index used in function dasdod is d = sqrt(theta * (SD / cutsd)^2 + (1 - theta) * (OD / cutod)^2), where cutsd and cutod are cutoffs for the SD and OD distributions in the class, and theta is a proportion. Proportion theta is automatically varied from 0 ot 1 (with step=.1) in the function, and results are providen for each theta.

The values SD / cutdsd and OD / cutdod are "standardized" SD and OD, respectively. The test observations that show standardized SD or OD higher than 1 may be considered as outliers for the class. In a soft classification context, observations with standardized SD or OD lower than 1 may be considered as belonging to the class.

The number of PCA components in each class is defined in argument ncomp. If they are not enough number of training observations in a given class, the number of components is automatically decreased (or the PCA is even cancelled if this number is lower than nmim).

Usage


dasdod(Xr, Yr, Xu, Yu = NULL, 
  ncomp, nmin = 5, ...)

Arguments

Xr

A n x p matrix or data frame of reference (= training) observations.

Yr

A vector of length n, or a n x 1 matrix, of reference (= training) responses (class membership).

Xu

A m x p matrix or data frame of new (= test) observations to be predicted.

Yu

A vector of length m, or a m x 1 matrix, of the true response (class membership). Default to NULL.

ncomp

Number of PCA components (i.e. scores) to be calculated for each class. A vector of same length as the number of classes, or an integer (in this last case, the same score number is given for all the classes). The number of scores is automatically decreased if the class size is too low.

nmin

Minimal number of training observations in the class for implementing a PCA (default to nmin = 5). If this number is lower than nmin, the corresponding class level is not considered (a NA is returned for the SIMCA index of this level).

...

Optionnal arguments to pass in function pca.

Value

A list of outputs (see examples), such as:

y

Responses for the test data.

fit

Predictions for the test data.

r

Residuals for the test data.

index

SIMCA index for the test data.

sdstand

Standardized SD for the test data.

odstand

Standardized OD for the test data.

cutsd

Cutoff for calculating standardized SD.

cutod

Cutoff for calculating standardized OD.

pvarcla

Percentage of X-variance explained by the PCA.

References

- Daszykowski, M., Kaczmarek, K., Stanimirova, I., Vander Heyden, Y., Walczak, B., 2007. Robust SIMCA-bounding influence of outliers. Chemometrics and Intelligent Laboratory Systems, 87, 95-103. https://doi.org/10.1016/j.chemolab.2006.10.003

- Durante, G., Bro, R., Cocchi, M. 2011. A classification tool for N-way array based on SIMCA methodology. Chem. Lab. Int. Syst., 106, 73-85.

- Vanden Branden, K., Hubert, M., 2005. Robust classification in high dimensions based on the SIMCA Method. Chem. Lab. Int. Syst., 79, 10-21.

Examples


data(datforages)

Xr <- datforages$Xr
yr <- datforages$yr

Xu <- datforages$Xu
yu <- datforages$yu

headm(Xr)
headm(Xu)

table(yr)
table(yu)

Xr <- snv(Xr)
Xu <- snv(Xu)

ncomp <- 15
fm <- dasdod(Xr, yr, Xu, yu, ncomp = ncomp)
names(fm)
head(fm$y)
head(fm$fit)
head(fm$r)
head(fm$index)
head(fm$sd)
fm$cutsd
head(fm$sdstand)
head(fm$od)
fm$cutod
head(fm$odstand)
fm$ncomp
fm$pvarcla
fm$ni

err(fm, ~ theta)


mlesnoff/rnirs documentation built on April 24, 2023, 4:17 a.m.