Crossvalidation for PCA
Description
Internal crossvalidation can be used for estimating the level of structure in a data set and to optimise the choice of number of principal components.
Usage
1 2 3  Q2(object, originalData = completeObs(object), fold = 5, nruncv = 1,
type = c("krzanowski", "impute"), verbose = interactive(),
variables = 1:nVar(object), ...)

Arguments
object 
A 
originalData 
The matrix (or ExpressionSet) that used to obtain the pcaRes object. 
fold 
The number of groups to divide the data in. 
nruncv 
The number of times to repeat the whole crossvalidation 
type 
krzanowski or imputation type crossvalidation 
verbose 

variables 
indices of the variables to use during crossvalidation calculation. Other variables are kept as they are and do not contribute to the total sumofsquares. 
... 
Further arguments passed to the 
Details
This method calculates Q^2 for a PCA model. This is the crossvalidated version of R^2 and can be interpreted as the ratio of variance that can be predicted independently by the PCA model. Poor (low) Q^2 indicates that the PCA model only describes noise and that the model is unrelated to the true data structure. The definition of Q^2 is:
Q^2=1  sum_i^k sum_j^n (x  \hat{x})^2 / ∑_i^k ∑_j^n(x^2)
for the matrix
x which has n rows and k columns. For a given
number of PC's x is estimated as \hat{x}=TP' (T are scores
and P are loadings). Although this defines the leaveoneout
crossvalidation this is not what is performed if fold is less
than the number of rows and/or columns. In 'impute' type CV,
diagonal rows of elements in the matrix are deleted and the
reestimated. In 'krzanowski' type CV, rows are sequentially left
out to build fold PCA models which give the loadings. Then,
columns are sequentially left out to build fold models for
scores. By combining scores and loadings from different models, we
can estimate completely left out values. The two types may seem
similar but can give very different results, krzanowski typically
yields more stable and reliable result for estimating data
structure whereas impute is better for evaluating missing value
imputation performance. Note that since Krzanowski CV operates on
a reduced matrix, it is not possible estimate Q2 for all
components and the result vector may therefore be shorter than
nPcs(object)
.
Value
A matrix or vector with Q^2 estimates.
Author(s)
Henning Redestig, Ondrej Mikula
References
Krzanowski, WJ. Crossvalidation in principal component analysis. Biometrics. 1987(43):3,575584
Examples
1 2 3 4 5 6 7 8 9 10  data(iris)
x < iris[,1:4]
pcIr < pca(x, nPcs=3)
q2 < Q2(pcIr, x)
barplot(q2, main="Krzanowski CV", xlab="Number of PCs", ylab=expression(Q^2))
## q2 for a single variable
Q2(pcIr, x, variables=2)
pcIr < pca(x, nPcs=3, method="nipals")
q2 < Q2(pcIr, x, type="impute")
barplot(q2, main="Imputation CV", xlab="Number of PCs", ylab=expression(Q^2))
