ddsimca: Data Driven SIMCA

View source: R/ddsimca.R

ddsimcaR Documentation

Data Driven SIMCA

Description

ddsimca is used to develop DD-SIMCA (Data Driven SIMCA) model for one-class classification.

Usage

ddsimca(
  x,
  classname,
  ncomp = min(nrow(x) - 1, ncol(x) - 1, 20),
  center = TRUE,
  scale = FALSE,
  pcv = list("ven", 10),
  alpha = 0.05,
  gamma = 0.01,
  exclrows = NULL,
  exclcols = NULL,
  prep = NULL,
  do.round = TRUE,
  ...
)

Arguments

x

a numerical matrix with data values.

classname

short text (up to 20 symbols) with class name.

ncomp

maximum number of components to calculate.

center

logical, do mean centering of data or not.

scale

logical, do standardization of data or not.

pcv

Procrustes cross-validation settings (see details).

alpha

significance level for making the predictions (can be also adjusted when model is applied to data).

gamma

significance level for detection of outliers (can be also adjusted when model is applied to data).

exclrows

rows to be excluded from calculations (numbers, names or vector with logical values)

exclcols

columns to be excluded from calculations (numbers, names or vector with logical values)

prep

optional list with preprocessing methods created using 'prep' function.

do.round

logical, round or not DoF for distances.

...

any other parameters suitable for pca method.

Details

DD-SIMCA is based on PCA model with additional functionality, so ddsimca class inherits most of the functionality of pca class.

In order to make a decision, DDSIMCA uses score and orthogonal distances to PCA model. It combines the two distances to joint full distance and uses chi-distribution for finding a critical value which is employed as decision rule. More details about DD-SIMCA can be found in [1] (open access).

Procrustes cross-validation (PCV) is used to generate a validation set in order to find optimal model complexity (number of components). The PCV settings are similar to the ones used for conventional cross-validation. The best way is to set 'pcv' value to a list, for example: pcv = list('ven', nseg) for systematic splits or pcv = list('rand', nseg) for random splits. In case if full cross-validation must be employed, use pcv = list('loo').

Value

Returns an object of ddsimca class with following fields:

classname

a short text with class name.

calres

an object of class simcares with classification results for a calibration data.

pvres

an object of class simcares with classification results for a Procrustes validation set.

Fields, inherited from pca class:

ncomp

number of components included to the model.

ncomp.selected

selected (optimal) number of components.

loadings

matrix with loading values (nvar x ncomp).

eigenvals

vector with eigenvalues for all existent components.

expvar

vector with explained variance for each component (in percent).

cumexpvar

vector with cumulative explained variance for each component (in percent).

info

information about the model, provided by user when build the model.

Author(s)

Sergey Kucheryavskiy (svkucheryavski@gmail.com)

References

1. Kucheryavskiy S, Rodionova O, Pomerantsev A. A comprehensive tutorial on Data-Driven SIMCA: Theory and implementation in web. Journal of Chemometrics. 2024; 38(7):e3556. doi:10.1002/cem.3556

2. S. Kucheryavskiy, O. Rodionova, A. Pomerantsev, Procrustes cross-validation of multivariate regression models. Analytica Chimica Acta. 2023; 1255:341096. doi:10.1016/j.aca.2023.341096.

See Also

Methods for ddsimca objects:

print.ddsimca shows information about the object.
summary.ddsimca shows summary statistics for the model.
plot.ddsimca makes an overview of DD-SIMCA model with four plots.
predict.ddsimca applies DD-SIMCA model to a new data.

Methods, inherited from classmodel class:

plotPredictions.classmodel shows plot with predicted values.
plotSensitivity.classmodel shows sensitivity plot.
plotSpecificity.classmodel shows specificity plot.
plotMisclassified.classmodel shows misclassified ratio plot.

Methods, inherited from pca class:

selectCompNum.pca set number of optimal components in the model
plotScores.pca shows scores plot.
plotLoadings.pca shows loadings plot.
plotVariance.pca shows explained variance plot.
plotCumVariance.pca shows cumulative explained variance plot.

Examples

## make a SIMCA model for Iris setosa class with full cross-validation
library(mdatools)

data = iris[, 1:4]
class = iris[, 5]

# take first 20 objects of setosa as calibration set
se = data[1:20, ]

# make SIMCA model and apply to test set
model = ddsimca(se, "setosa", pcv = list("ven", 10))
model = selectCompNum(model, 1)

# show information, summary and plot overview
print(model)
summary(model)
plot(model)



mdatools documentation built on March 6, 2026, 5:08 p.m.