dasdod | R Documentation |
Discriminant analysis using a SIMCA index (for comprehensive reviews, see e.g.: Vanden Branden & Hubert 2005, Daszykowski et al. 2007, Durante et al. 2011).
For each new observation to predict, the function calculates a "SIMCA" index (output index
) between the new observation and each of the classes of the reference (= training) data. The final predicted class corresponds to the class for which the index is the lowest. The principle is described below.
A PCA is implemented for each class (= model representing the class). Two distances, referred to as SD and OD, are calculated for each new observation to predict and class. SD is the "score distance", i.e. the Mahalanobis distance between the projection of the new observation in the PCA score space and the center of this space. OD is the "orthogonal distance" (X
-residual), i.e. the Euclidean distance between the new observation and its projection on the score space.
The SIMCA index used in function dasdod
is d = sqrt(theta * (SD / cutsd)^2 + (1 - theta) * (OD / cutod)^2)
, where cutsd
and cutod
are cutoffs for the SD and OD distributions in the class, and theta
is a proportion. Proportion theta
is automatically varied from 0 ot 1 (with step=.1) in the function, and results are providen for each theta
.
The values SD / cutdsd
and OD / cutdod
are "standardized" SD and OD, respectively. The test observations that show standardized SD or OD higher than 1 may be considered as outliers for the class. In a soft classification context, observations with standardized SD or OD lower than 1 may be considered as belonging to the class.
The number of PCA components in each class is defined in argument ncomp
. If they are not enough number of training observations in a given class, the number of components is automatically decreased (or the PCA is even cancelled if this number is lower than nmim
).
dasdod(Xr, Yr, Xu, Yu = NULL,
ncomp, nmin = 5, ...)
Xr |
A |
Yr |
A vector of length |
Xu |
A |
Yu |
A vector of length |
ncomp |
Number of PCA components (i.e. scores) to be calculated for each class. A vector of same length as the number of classes, or an integer (in this last case, the same score number is given for all the classes). The number of scores is automatically decreased if the class size is too low. |
nmin |
Minimal number of training observations in the class for implementing a PCA (default to |
... |
Optionnal arguments to pass in function |
A list of outputs (see examples), such as:
y |
Responses for the test data. |
fit |
Predictions for the test data. |
r |
Residuals for the test data. |
index |
SIMCA index for the test data. |
sdstand |
Standardized SD for the test data. |
odstand |
Standardized OD for the test data. |
cutsd |
Cutoff for calculating standardized SD. |
cutod |
Cutoff for calculating standardized OD. |
pvarcla |
Percentage of |
- Daszykowski, M., Kaczmarek, K., Stanimirova, I., Vander Heyden, Y., Walczak, B., 2007. Robust SIMCA-bounding influence of outliers. Chemometrics and Intelligent Laboratory Systems, 87, 95-103. https://doi.org/10.1016/j.chemolab.2006.10.003
- Durante, G., Bro, R., Cocchi, M. 2011. A classification tool for N-way array based on SIMCA methodology. Chem. Lab. Int. Syst., 106, 73-85.
- Vanden Branden, K., Hubert, M., 2005. Robust classification in high dimensions based on the SIMCA Method. Chem. Lab. Int. Syst., 79, 10-21.
data(datforages)
Xr <- datforages$Xr
yr <- datforages$yr
Xu <- datforages$Xu
yu <- datforages$yu
headm(Xr)
headm(Xu)
table(yr)
table(yu)
Xr <- snv(Xr)
Xu <- snv(Xu)
ncomp <- 15
fm <- dasdod(Xr, yr, Xu, yu, ncomp = ncomp)
names(fm)
head(fm$y)
head(fm$fit)
head(fm$r)
head(fm$index)
head(fm$sd)
fm$cutsd
head(fm$sdstand)
head(fm$od)
fm$cutod
head(fm$odstand)
fm$ncomp
fm$pvarcla
fm$ni
err(fm, ~ theta)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.