dasdod: DA using a SIMCA index
In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

dasdod

R Documentation

DA using a SIMCA index

Description

Discriminant analysis using a SIMCA index (for comprehensive reviews, see e.g.: Vanden Branden & Hubert 2005, Daszykowski et al. 2007, Durante et al. 2011).

For each new observation to predict, the function calculates a "SIMCA" index (output index) between the new observation and each of the classes of the reference (= training) data. The final predicted class corresponds to the class for which the index is the lowest. The principle is described below.

A PCA is implemented for each class (= model representing the class). Two distances, referred to as SD and OD, are calculated for each new observation to predict and class. SD is the "score distance", i.e. the Mahalanobis distance between the projection of the new observation in the PCA score space and the center of this space. OD is the "orthogonal distance" (X-residual), i.e. the Euclidean distance between the new observation and its projection on the score space.

The SIMCA index used in function dasdod is d = sqrt(theta * (SD / cutsd)^2 + (1 - theta) * (OD / cutod)^2), where cutsd and cutod are cutoffs for the SD and OD distributions in the class, and theta is a proportion. Proportion theta is automatically varied from 0 ot 1 (with step=.1) in the function, and results are providen for each theta.

The values SD / cutdsd and OD / cutdod are "standardized" SD and OD, respectively. The test observations that show standardized SD or OD higher than 1 may be considered as outliers for the class. In a soft classification context, observations with standardized SD or OD lower than 1 may be considered as belonging to the class.

The number of PCA components in each class is defined in argument ncomp. If they are not enough number of training observations in a given class, the number of components is automatically decreased (or the PCA is even cancelled if this number is lower than nmim).

Usage


dasdod(Xr, Yr, Xu, Yu = NULL, 
  ncomp, nmin = 5, ...)

Arguments

`Xr`	A `n x p` matrix or data frame of reference (= training) observations.
`Yr`	A vector of length `n`, or a `n x 1` matrix, of reference (= training) responses (class membership).
`Xu`	A `m x p` matrix or data frame of new (= test) observations to be predicted.
`Yu`	A vector of length `m`, or a `m x 1` matrix, of the true response (class membership). Default to `NULL`.
`ncomp`	Number of PCA components (i.e. scores) to be calculated for each class. A vector of same length as the number of classes, or an integer (in this last case, the same score number is given for all the classes). The number of scores is automatically decreased if the class size is too low.
`nmin`	Minimal number of training observations in the class for implementing a PCA (default to `nmin = 5`). If this number is lower than `nmin`, the corresponding class level is not considered (a `NA` is returned for the SIMCA index of this level).
`...`	Optionnal arguments to pass in function `pca`.

Value

A list of outputs (see examples), such as:

`y`	Responses for the test data.
`fit`	Predictions for the test data.
`r`	Residuals for the test data.
`index`	SIMCA index for the test data.
`sdstand`	Standardized SD for the test data.
`odstand`	Standardized OD for the test data.
`cutsd`	Cutoff for calculating standardized SD.
`cutod`	Cutoff for calculating standardized OD.
`pvarcla`	Percentage of `X`-variance explained by the PCA.

References

- Daszykowski, M., Kaczmarek, K., Stanimirova, I., Vander Heyden, Y., Walczak, B., 2007. Robust SIMCA-bounding influence of outliers. Chemometrics and Intelligent Laboratory Systems, 87, 95-103. https://doi.org/10.1016/j.chemolab.2006.10.003

- Durante, G., Bro, R., Cocchi, M. 2011. A classification tool for N-way array based on SIMCA methodology. Chem. Lab. Int. Syst., 106, 73-85.

- Vanden Branden, K., Hubert, M., 2005. Robust classification in high dimensions based on the SIMCA Method. Chem. Lab. Int. Syst., 79, 10-21.

Examples


data(datforages)

Xr <- datforages$Xr
yr <- datforages$yr

Xu <- datforages$Xu
yu <- datforages$yu

headm(Xr)
headm(Xu)

table(yr)
table(yu)

Xr <- snv(Xr)
Xu <- snv(Xu)

ncomp <- 15
fm <- dasdod(Xr, yr, Xu, yu, ncomp = ncomp)
names(fm)
head(fm$y)
head(fm$fit)
head(fm$r)
head(fm$index)
head(fm$sd)
fm$cutsd
head(fm$sdstand)
head(fm$od)
fm$cutod
head(fm$odstand)
fm$ncomp
fm$pvarcla
fm$ni

err(fm, ~ theta)

mlesnoff/rnirs documentation built on April 24, 2023, 4:17 a.m.

mlesnoff/rnirs index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mlesnoff/rnirs
Dimension reduction, Regression and Discrimination for Chemometrics

dasdod: DA using a SIMCA index
In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

DA using a SIMCA index

Description

Usage

Arguments

Value

References

Examples

Related to dasdod in mlesnoff/rnirs...

R Package Documentation

Browse R Packages

We want your feedback!

mlesnoff/rnirs Dimension reduction, Regression and Discrimination for Chemometrics

dasdod: DA using a SIMCA index In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

DA using a SIMCA index

Description

Usage

Arguments

Value

References

Examples

Related to dasdod in mlesnoff/rnirs...

R Package Documentation

Browse R Packages

We want your feedback!

mlesnoff/rnirs
Dimension reduction, Regression and Discrimination for Chemometrics

dasdod: DA using a SIMCA index
In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics