CSimca: Classification in high dimensions based on the (classical)...

Description Usage Arguments Details Value Author(s) References Examples

Description

CSimca performs the (classical) SIMCA method. This method classifies a data matrix x with a known group structure. To reduce the dimension on each group a PCA analysis is performed. Afterwards a classification rule is developped to determine the assignment of new observations.

Usage

1
2
3
4
5
6
CSimca(x, ...)
## Default S3 method:
CSimca(x, grouping, prior=proportions, k, kmax = ncol(x), 
    tol = 1.0e-4, trace=FALSE, ...)
## S3 method for class 'formula'
CSimca(formula, data = NULL, ..., subset, na.action)

Arguments

formula

a formula of the form y~x, it describes the response and the predictors. The formula can be more complicated, such as y~log(x)+z etc (see formula for more details). The response should be a factor representing the response variable, or any vector that can be coerced to such (such as a logical variable).

data

an optional data frame (or similar: see model.frame) containing the variables in the formula formula.

subset

an optional vector used to select rows (observations) of the data matrix x.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The default is na.omit.

x

a matrix or data frame containing the explanatory variables (training set).

grouping

grouping variable: a factor specifying the class for each observation.

prior

prior probabilities, default to the class proportions for the training set.

tol

tolerance

k

number of principal components to compute. If k is missing, or k = 0, the algorithm itself will determine the number of components by finding such k that l_k/l_1 >= 10.E-3 and Σ_{j=1}^k l_j/Σ_{j=1}^r l_j >= 0.8. It is preferable to investigate the scree plot in order to choose the number of components and then run again. Default is k=0.

kmax

maximal number of principal components to compute. Default is kmax=10. If k is provided, kmax does not need to be specified, unless k is larger than 10.

trace

whether to print intermediate results. Default is trace = FALSE

...

arguments passed to or from other methods.

Details

CSimca, serving as a constructor for objects of class CSimca-class is a generic function with "formula" and "default" methods.

SIMCA is a two phase procedure consisting of PCA performed on each group separately for dimension reduction followed by classification rules built in the lower dimensional space (note that the dimension in each group can be different). In original SIMCA new observations are classified by means of their deviations from the different PCA models. Here (and also in the robust versions implemented in this package) the classification rules will be obtained using two popular distances arising from PCA - orthogonal distances (OD) and score distances (SD). For the definition of these distances, the definition of the cutoff values and the standartization of the distances see Vanden Branden K, Hubert M (2005) and Todorov and Filzmoser (2009).

Value

An S4 object of class CSimca-class which is a subclass of of the virtual class Simca-class.

Author(s)

Valentin Todorov valentin.todorov@chello.at

References

Vanden Branden K, Hubert M (2005) Robust classification in high dimensions based on the SIMCA method. Chemometrics and Intellegent Laboratory Systems 79:10–21

Todorov V & Filzmoser P (2009), An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software, 32(3), 1–47, doi: 10.18637/jss.v032.i03.

Todorov V & Filzmoser P (2014), Software Tools for Robust Analysis of High-Dimensional Data. Austrian Journal of Statistics, 43(4), 255–266, doi: 10.17713/ajs.v43i4.44.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
data(pottery)
dim(pottery)        # 27 observations in 2 classes, 6 variables
head(pottery)

## Build the SIMCA model. Use RSimca for a robust version
cs <- CSimca(origin~., data=pottery)
cs
summary(cs)


## generate a sample from the pottery data set -
##  this will be the "new" data to be predicted
smpl <- sample(1:nrow(pottery), 5)
test <- pottery[smpl, -7]          # extract the test sample. Remove the last (grouping) variable
print(test)


## predict new data
pr <- predict(cs, newdata=test)

pr@classification 

Example output

Loading required package: rrcov
Loading required package: robustbase
Scalable Robust Estimators with High Breakdown Point (version 1.4-7)

Robust Multivariate Methods for High Dimensional Data (version 0.2-5)

[1] 27  7
    SI   AL   FE  MG   CA   TI origin
1 55.8 14.0 10.2 4.9  5.0 0.88  Attic
2 51.2 12.5 10.1 4.4  4.8 0.86  Attic
3 57.1 14.0  8.3 6.4 11.2 0.75  Attic
4 53.8 13.1  9.3 4.9  6.6 0.81  Attic
5 59.4 14.8  9.8 5.5  5.4 0.89  Attic
6 56.2 14.0  9.9 4.9  5.4 0.89  Attic
Call:
CSimca(origin ~ ., data = pottery)

Prior Probabilities of Groups:
    Attic  Eritrean 
0.4814815 0.5185185 

Pca objects for Groups:

Call:
PcaClassic(x = class, k = k[i], trace = trace)
Importance of components:
                          PC1    PC2
Standard deviation     4.5958 2.1338
Proportion of Variance 0.8227 0.1773
Cumulative Proportion  0.8227 1.0000

Call:
PcaClassic(x = class, k = k[i], trace = trace)
Importance of components:
                          PC1    PC2
Standard deviation     4.0203 2.4797
Proportion of Variance 0.7244 0.2756
Cumulative Proportion  0.7244 1.0000

Call:
CSimca(formula = origin ~ ., data = pottery)

Prior Probabilities of Groups:
    Attic  Eritrean 
0.4814815 0.5185185 

Pca objects for Groups:

Call:
PcaClassic(x = class, k = k[i], trace = trace)
Importance of components:
                          PC1    PC2
Standard deviation     4.5958 2.1338
Proportion of Variance 0.8227 0.1773
Cumulative Proportion  0.8227 1.0000

Call:
PcaClassic(x = class, k = k[i], trace = trace)
Importance of components:
                          PC1    PC2
Standard deviation     4.0203 2.4797
Proportion of Variance 0.7244 0.2756
Cumulative Proportion  0.7244 1.0000
     SI   AL  FE  MG   CA   TI
12 69.9 13.1 9.8 4.4  4.4 0.89
5  59.4 14.8 9.8 5.5  5.4 0.89
8  52.9 13.4 9.9 7.5  8.5 0.89
24 54.1 13.6 8.1 4.4 12.8 0.71
15 50.8 15.5 9.5 2.3  4.9 0.85
[1] Attic    Attic    Attic    Eritrean Eritrean
Levels: Attic Eritrean

rrcovHD documentation built on April 23, 2021, 9:08 a.m.

Related to CSimca in rrcovHD...