ordPCA | R Documentation |
This function performs nonlinear principal components analysis when the variables of interest have ordinal level scale using a second-order difference penalty.
ordPCA(H, p, lambda = c(1), maxit = 100, crit = 1e-7, qstart = NULL,
Ks = apply(H,2,max), constr = rep(FALSE, ncol(H)), trace = FALSE,
CV = FALSE, k = 5, CVfit = FALSE)
H |
a matrix or data frame of of integers 1,2,... giving the observed levels of the ordinal variables; provides the data for the principal components analysis. |
p |
the number of principal components to be extracted. |
lambda |
a numeric value or a vector (in decreasing order) defining the amount of shrinkage; defaults to 1. |
maxit |
the maximum number of iterations; defaults to 100. |
crit |
convergence tolerance; defaults to 1e-7. |
qstart |
optional list of quantifications for the initial linear PCA. |
Ks |
a vector containing the highest level of each variable. |
constr |
a logical vector specifying whether monotonicity constraints should be applied to the variables. |
trace |
logical; if |
CV |
a logical value indicating whether k-fold cross-validation should be performed in order to evaluate the performance and/or select an optimal smoothing parameter. |
k |
the number of folds to be specified; only if |
CVfit |
logical; to be specified only if |
In order to respect the ordinal scale of the data, principal components analysis is not applied to data matrix H
itself, but to newly constructed variables by assigning numerical values – the quantifications – to the categories via penalized, optimal scaling/scoring.
The calculation is done by alternately cycling through data scoring and PCA until convergence.
The penalty parameter controls the amount of shrinkage: For lambda = 0
, purely nonlinear PCA via standard, optimal scaling is obtained. As lambda
becomes very large, the quantifications are shrunken towars linearity, i.e., usual PCA is applied to levels 1,2,... ignoring the ordinal scale level of the variables.
Note that optimization starts with the first component of lambda
. Thus, if lambda
is not in decreasing order, the vector will be sorted internally and so will be corresponding results.
In case of cross-validation, for each lambda
the proportion of variance accounted for (VAF) is given for both the training and test data (see below).
A List with components:
qs |
a list of quantifications, if |
Q |
data matrix after scaling, if |
X |
matrix of factor values resulting from |
A |
loadings matrix as a result from |
iter |
number of iterations used. |
pca |
object of class |
trace |
vector of VAF values in each iteration, if |
VAFtrain |
matrix with columns corresponding to |
VAFtest |
VAF matrix for the test data within cross-validation. |
If cross-validation is desired, the pca results are stored in a list called fit
with each list entry corresponding to a certain fold. Within such a list entry, all sub entries can be accessed as described above.
However, VAF values are stored in VAFtrain
or VAFtest
and can be accessed directly.
Aisouda Hoshiyar, Jan Gertheiss
Hoshiyar, A. (2020). Analyzing Likert-type data using penalized non-linear principal components analysis, in: Proceedings of the 35th International Workshop on Statistical Modelling, Vol. I, 337-340.
Hoshiyar, A., H.A.L. Kiers, and J. Gertheiss (2021). Penalized non-linear principal components analysis for ordinal variables with an application to international classification of functioning core sets, British Journal of Mathematical and Statistical Psychology, 76, 353-371.
Linting, M., J.J. Meulmann, A.J. von der Kooji, and P.J.F. Groenen (2007). Nonlinear principal components analysis: Introduction and application, Psychological Methods, 12, 336-358.
prcomp
## Not run:
## load ICF data
data(ICFCoreSetCWP)
# adequate coding to get levels 1,..., max
H <- ICFCoreSetCWP[, 1:67] + matrix(c(rep(1, 50), rep(5, 16), 1),
nrow(ICFCoreSetCWP), 67,
byrow = TRUE)
xnames <- colnames(H)
# nonlinear PCA
icf_pca1 <- ordPCA(H, p = 2, lambda = c(5, 0.5, 0.0001), maxit = 1000,
Ks = c(rep(5, 50), rep(9, 16), 5),
constr = c(rep(TRUE, 50), rep(FALSE, 16), TRUE))
# estimated quantifications
icf_pca1$qs[[55]]
plot(1:9, icf_pca1$qs[[55]][,1], type="b",
xlab="category", ylab="quantification", col=1, main=xnames[55],
ylim=range(c(icf_pca1$qs[[55]][,1],icf_pca1$qs[[55]][,2],icf_pca1$qs[[55]][,3])))
lines(icf_pca1$qs[[55]][,2], type = "b", col = 2, lty = 2, pch = 2, lwd=2)
lines(icf_pca1$qs[[55]][,3], type = "b", col = 3, lty = 3, pch = 3, lwd=2)
# compare VAF
icf_pca2 <- ordPCA(H, p = 2, lambda = c(5, 0.5, 0.0001), maxit = 1000,
Ks = c(rep(5, 50), rep(9, 16), 5),
constr = c(rep(TRUE, 50), rep(FALSE, 16), TRUE),
CV = TRUE, k = 5)
icf_pca2$VAFtest
## load ehd data
require(psy)
data(ehd)
# recoding to get levels 1,..., max
H <- ehd + 1
# nonlinear PCA
ehd1 <- ordPCA(H, p = 5, lambda = 0.5, maxit = 100,
constr = rep(TRUE,ncol(H)),
CV = FALSE)
# resulting PCA on the scaled variables
summary(ehd1$pca)
# plot quantifications
oldpar <- par(mfrow = c(4,5))
for(j in 1:length(ehd1$qs))
plot(1:5, ehd1$qs[[j]], type = "b", xlab = "level", ylab = "quantification",
main = colnames(H)[j])
par(oldpar)
# include cross-validation
lambda <- 10^seq(4,-4, by = -0.1)
set.seed(456)
cvResult <- ordPCA(H, p = 5, lambda = lambda, maxit = 100,
constr = rep(TRUE,ncol(H)),
CV = TRUE, k = 5, CVfit = FALSE)
# optimal lambda
lambda[which.max(apply(cvResult$VAFtest,2,mean))]
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.