ordPCA: Penalized nonlinear PCA for ordinal variables

View source: R/ordPCA.R

ordPCAR Documentation

Penalized nonlinear PCA for ordinal variables

Description

This function performs nonlinear principal components analysis when the variables of interest have ordinal level scale using a second-order difference penalty.

Usage

ordPCA(H, p, lambda = c(1), maxit = 100, crit = 1e-7, qstart = NULL, 
       Ks = apply(H,2,max), constr = rep(FALSE, ncol(H)), trace = FALSE,
       CV = FALSE, k = 5, CVfit = FALSE)

Arguments

H

a matrix or data frame of of integers 1,2,... giving the observed levels of the ordinal variables; provides the data for the principal components analysis.

p

the number of principal components to be extracted.

lambda

a numeric value or a vector (in decreasing order) defining the amount of shrinkage; defaults to 1.

maxit

the maximum number of iterations; defaults to 100.

crit

convergence tolerance; defaults to 1e-7.

qstart

optional list of quantifications for the initial linear PCA.

Ks

a vector containing the highest level of each variable.

constr

a logical vector specifying whether monotonicity constraints should be applied to the variables.

trace

logical; if TRUE, tracing information on the progress of the optimization is produced in terms of VAF in each iteration.

CV

a logical value indicating whether k-fold cross-validation should be performed in order to evaluate the performance and/or select an optimal smoothing parameter.

k

the number of folds to be specified; only if CV is set to TRUE.

CVfit

logical; to be specified only if CV = TRUE. If CVfit = TRUE and lambda is a vector of length > 5, additional yes/no dialog appears; if FALSE, only VAF values are provided (recommended); else, also lists of matrices of PCA results are produced and stored.

Details

In order to respect the ordinal scale of the data, principal components analysis is not applied to data matrix H itself, but to newly constructed variables by assigning numerical values – the quantifications – to the categories via penalized, optimal scaling/scoring. The calculation is done by alternately cycling through data scoring and PCA until convergence.

The penalty parameter controls the amount of shrinkage: For lambda = 0, purely nonlinear PCA via standard, optimal scaling is obtained. As lambda becomes very large, the quantifications are shrunken towars linearity, i.e., usual PCA is applied to levels 1,2,... ignoring the ordinal scale level of the variables.

Note that optimization starts with the first component of lambda. Thus, if lambda is not in decreasing order, the vector will be sorted internally and so will be corresponding results.

In case of cross-validation, for each lambda the proportion of variance accounted for (VAF) is given for both the training and test data (see below).

Value

A List with components:

qs

a list of quantifications, if lambda is specified as a single value. Otherwise, a list of matrices, each column corresponding to a certain lambda value.

Q

data matrix after scaling, if lambda is scalar. Otherwise, a list of matrices with each list entry corresponding to a certain lambda value.

X

matrix of factor values resulting from prcomp, if lambda is scalar. Otherwise, list of matrices.

A

loadings matrix as a result from prcomp, if lambda is scalar. Otherwise, list of matrices.

iter

number of iterations used.

pca

object of class "prcomp" returned by prcomp.

trace

vector of VAF values in each iteration, if lambda is specified as a single value. Otherwise, a list of vectors, each entry corresponding to a certain lambda value.

VAFtrain

matrix with columns corresponding to lambda and rows corresponding to the folds k. Contains corresponding proportions of variance accounted for (VAF) on the training data within cross-validation. VAF here is defined in terms of the proportion of variance explained by the first p PCs.

VAFtest

VAF matrix for the test data within cross-validation.

If cross-validation is desired, the pca results are stored in a list called fit with each list entry corresponding to a certain fold. Within such a list entry, all sub entries can be accessed as described above. However, VAF values are stored in VAFtrain or VAFtest and can be accessed directly.

Author(s)

Aisouda Hoshiyar, Jan Gertheiss

References

Hoshiyar, A. (2020). Analyzing Likert-type data using penalized non-linear principal components analysis, in: Proceedings of the 35th International Workshop on Statistical Modelling, Vol. I, 337-340.

Hoshiyar, A., H.A.L. Kiers, and J. Gertheiss (2021). Penalized non-linear principal components analysis for ordinal variables with an application to international classification of functioning core sets, British Journal of Mathematical and Statistical Psychology, 76, 353-371.

Linting, M., J.J. Meulmann, A.J. von der Kooji, and P.J.F. Groenen (2007). Nonlinear principal components analysis: Introduction and application, Psychological Methods, 12, 336-358.

See Also

prcomp

Examples

## Not run: 
## load ICF data 
data(ICFCoreSetCWP)

# adequate coding to get levels 1,..., max 
H <- ICFCoreSetCWP[, 1:67] + matrix(c(rep(1, 50), rep(5, 16), 1),
                                    nrow(ICFCoreSetCWP), 67,
                                    byrow = TRUE)
xnames <- colnames(H)                                    
                                    
# nonlinear PCA
icf_pca1 <- ordPCA(H, p = 2, lambda = c(5, 0.5, 0.0001), maxit = 1000, 
                   Ks = c(rep(5, 50), rep(9, 16), 5), 
                   constr = c(rep(TRUE, 50), rep(FALSE, 16), TRUE))

# estimated quantifications 
icf_pca1$qs[[55]]

plot(1:9, icf_pca1$qs[[55]][,1], type="b", 
xlab="category", ylab="quantification", col=1, main=xnames[55], 
ylim=range(c(icf_pca1$qs[[55]][,1],icf_pca1$qs[[55]][,2],icf_pca1$qs[[55]][,3])))
lines(icf_pca1$qs[[55]][,2], type = "b", col = 2, lty = 2, pch = 2, lwd=2)
lines(icf_pca1$qs[[55]][,3], type = "b", col = 3, lty = 3, pch = 3, lwd=2)

# compare VAF 
icf_pca2 <- ordPCA(H, p = 2, lambda = c(5, 0.5, 0.0001), maxit = 1000, 
                   Ks = c(rep(5, 50), rep(9, 16), 5), 
                   constr = c(rep(TRUE, 50), rep(FALSE, 16), TRUE),
                   CV = TRUE, k = 5)
icf_pca2$VAFtest

## load ehd data 
require(psy)
data(ehd)

# recoding to get levels 1,..., max 
H <- ehd + 1

# nonlinear PCA
ehd1 <- ordPCA(H, p = 5, lambda = 0.5, maxit = 100,
               constr = rep(TRUE,ncol(H)),
               CV = FALSE)

# resulting PCA on the scaled variables
summary(ehd1$pca)

# plot quantifications
oldpar <- par(mfrow = c(4,5))
for(j in 1:length(ehd1$qs))
  plot(1:5, ehd1$qs[[j]], type = "b", xlab = "level", ylab = "quantification",
  main = colnames(H)[j])
par(oldpar)

# include cross-validation
lambda <- 10^seq(4,-4, by = -0.1)
set.seed(456)
cvResult <- ordPCA(H, p = 5, lambda = lambda, maxit = 100,
                    constr = rep(TRUE,ncol(H)),
                    CV = TRUE, k = 5, CVfit = FALSE)
# optimal lambda                    
lambda[which.max(apply(cvResult$VAFtest,2,mean))]

## End(Not run)

ordPens documentation built on Oct. 10, 2023, 5:07 p.m.