knitr::opts_chunk$set( collapse = TRUE, comment = "#>", out.extra = 'style="border:0; margin: auto"', fig.width=5, fig.height=5, out.width="400px", out.height="400px" )
The ordPens
R package offers selection, and/or smoothing/fusing of ordinally scaled independent variables using a group lasso or generalized ridge penalty. In addition, nonlinear principal components analysis for ordinal variables is provided, using a second-order difference penalty.
For Nonlinear PCA, performance evaluation and selection of an optimal penalty parameter, use ordPCA()
.
Smoothing and selection of ordinal predictors is done by the function ordSelect()
; smoothing only, by ordSmooth()
; fusion and selection of ordinal predictors by ordFusion()
. For ANOVA with ordinal factors, use ordAOV()
. To test for genes that are differentially expressed between ordinal levels, use ordGene()
. See also vignette("ordPens", package = "ordPens")
.
library(ordPens)
ordPens
offers, inter alia, nonlinear principal components analysis for ordinal variables.
To explore the tools available for ordinal PCA, we’ll use some publicly available data sets. The ehd
data from R package psy
(Falissard, 2009) contains 269 observations of 20 ordinally scaled variables forming a polydimensional rating scale of depressive mood (Jouvent et al., 1988) and is documented in ?ehd
.
To ensure adequate coding, we need to add +1 to the observed values.
library(psy) data(ehd) H <- ehd + 1 head(H)
The wrapper function ordPCA()
handles both matrices or data frames of integers 1,2,..., giving the observed levels of the ordinal variables, as its first argument. The columns are by no means limited to have the same number of levels and can be specified by the argument Ks
. Basically the algorithm converts the ordinal data to interval scale by finding optimal scores (known as optimal scoring/scaling in the context of nonlinear PCA, cf. Linting et al., 2007). Optimization, i.e. finding optimal quantifications/scores and principal components with corresponding loadings, is done via alternating least squares.
In order to take into account the ordinal character of the data, penalized ALS is applied where the amount of penalty can be controlled by the smoothing parameter lambda
$\in \mathbb{R}_0^{+}$ using a second-order difference penalty.
In addition, the function provides the option of both non-monotone effects and incorporating constraints enforcing monotonicity.
Let’s start with the most basic example with univariate penalty, i.e. lambda
being a scalar. We extract 2 principal components, specified by the argument p
.
ehd_pca1 <- ordPCA(H, p = 2, lambda = 0.5, maxit = 100, crit = 1e-7, qstart = NULL, Ks = apply(H, 2, max), constr = rep(TRUE, ncol(H)), CV = FALSE, k = 5, CVfit = FALSE)
Extract the typical PCA summary (in alignment with base::prcomp()
):
summary(ehd_pca1$pca)
Estimated quantifications:
ehd_pca1$qs
We can visualize the estimates, e.g., for Variable e9, by selecting the corresponding list entry:
plot(1:5, ehd_pca1$qs[[9]], type = "b", xlab = "category", ylab = "quantification", col = 2, main = colnames(H)[9], bty = "n")
We can apply the function for different values of the penalty parameter. As we specify lambda
as a (decreasing) vector, the output will result in a list of multivariate matrices. Note that optimization starts with the first component of lambda. Thus, if lambda is not in decreasing order, the vector will be sorted internally and so will be corresponding results.
ehd_pca2 <- ordPCA(H, p = 2, lambda = c(5, 0.5, 0), maxit = 100, crit = 1e-7, qstart = NULL, Ks = apply(H, 2, max), constr = rep(TRUE, ncol(H)), CV = FALSE)
ehd_pca2$pca
: List of length corresponding to lambda, e.g. for $\lambda=0$:
summary(ehd_pca2$pca[[1]])
ehd_pca2$qs
: List of matrices (of dimension Ks x length(lambda)
) with entries corresponding to the variables, e.g. for the second variable (e9):
ehd_pca2$qs[[9]]
With an increasing penalty parameter, quantifications become increasingly linear:
plot(ehd_pca2$qs[[9]][,3], type = "b", xlab = "category", ylab = "quantification", col = 1, ylim = range(ehd_pca2$qs[[9]]), main = colnames(H)[9], bty = "n") lines(ehd_pca2$qs[[9]][,2], type = "b", col = 2, lty = 2, pch = 2, lwd=2) lines(ehd_pca2$qs[[9]][,1], type = "b", col = 3, lty = 3, pch = 3, lwd=2)
Visualizing some variables:
par(mar = c(4.1, 4.1, 3.1, 1.1)) for(j in c(1, 9, 12, 13, 15, 19)){ plot(ehd_pca2$qs[[j]][,3], type = "b", main = colnames(H)[j], xlab = "category", ylab = "quantification", lwd = 2, bty = "n") lines(ehd_pca2$qs[[j]][,2], type = "b", col = 2, lty = 2, pch = 2, lwd = 2) lines(ehd_pca2$qs[[j]][,1], type = "b", col = 3, lty = 3, pch = 3, lwd = 2) }
We can also compare different amounts of penalization by means of variance accounted for (VAF) for performance evaluation by setting CV = TRUE
and specifying the number of folds k
. The resulting output is a matrix with columns corresponding to lambda
and rows corresponding to the folds k
. Use CVfit = TRUE
to perform both k-fold cross validation and estimation (which, however, can be time-consuming if lambda has many elements).
ehd_pca3 <- ordPCA(H, p = 2, lambda = c(5, 0.5, 0.001), maxit = 100, crit = 1e-7, qstart = NULL, Ks = apply(H, 2, max), constr = rep(TRUE, ncol(H)), CV = TRUE, k = 5) ehd_pca3$VAFtest
For selecting the right amount of penalization, however, k-fold cross validation should be performed over a fine grid of (many) sensible values $\lambda$. Due to time-consuming computation and undesireably high dimensions of outputs, we recommend to set the default CVfit = FALSE
. By doing so, the function only stores VAF values for both the training set and the validation set.
In addiction, the function provides the option of both non-monotone effects and incorporating constraints enforcing monotonicity, specified by the logical argument constr
. For the ehd
data the assumption of monotonic effects seems reasonable.
lambda <- 10^seq(4, -4, by = -0.1) set.seed(456) ehd_CV_p2 <- ordPCA(H, p = 2, lambda = lambda, maxit = 100, crit = 1e-7, Ks = apply(H, 2, max), qstart = NULL, constr = rep(TRUE, ncol(H)), CV = TRUE, k = 5, CVfit = FALSE) lam_p2 <- (lambda)[which.max(apply(ehd_CV_p2$VAFtest,2,mean))] ehd_CV_p2$VAFtest
The optimal $\lambda$ can then be determined by the value maximizing the cross-validated VAF on the validation set, averaged over folds:
plot(log10(lambda), apply(ehd_CV_p2$VAFtrain,2,mean), type = "l", xlab = expression(log[10](lambda)), ylab = "proportion of variance explained", main = "training data", cex.axis = 1.2, cex.lab = 1.2, cex.main = 1.4) plot(log10(lambda), apply(ehd_CV_p2$VAFtest,2,mean), type = "l", xlab = expression(log[10](lambda)), ylab = "proportion of variance explained", main = "validation data", cex.axis = 1.2, cex.lab = 1.2, cex.main = 1.4) abline(v = log10(lambda)[which.max(apply(ehd_CV_p2$VAFtest,2,mean))])
Note, that the choice of $\lambda$ relies on a fixed number of components p
, which must be specified before $\lambda$ is selected. One strategy for selecting an appropriate number of components, is to use the elbow of the scree plot for standard linear PCA as an initial guess.
To make sure that the choice is valid, we could then look at different scree plots when extracting different p
's in that area (here: p=1
, p=2
, p=3
, p=4
) and inserting the respective optimal $\lambda$ value:
# evaluate model with optimal lambda for p=2 ehd_pca_p2 <- ordPCA(H, p=2, lambda = lam_p2, Ks=apply(H,2,max), constr=rep(TRUE,ncol(H))) # evaluate optimal lambda & model for p=1, p=3, p=4 set.seed(456) ehd_CV_p1 <- ordPCA(H, p = 1, lambda=lambda, constr = rep(TRUE, ncol(H)), CV = TRUE, k = 5) ehd_CV_p3 <- ordPCA(H, p = 3, lambda=lambda, constr = rep(TRUE, ncol(H)), CV = TRUE, k = 5) ehd_CV_p4 <- ordPCA(H, p = 4, lambda=lambda, constr = rep(TRUE, ncol(H)), CV = TRUE, k = 5) lam_p1 <- (lambda)[which.max(apply(ehd_CV_p1$VAFtest,2,mean))] lam_p3 <- (lambda)[which.max(apply(ehd_CV_p3$VAFtest,2,mean))] lam_p4 <- (lambda)[which.max(apply(ehd_CV_p4$VAFtest,2,mean))] ehd_pca_p1 <- ordPCA(H, p=3, lambda=lam_p1, Ks=apply(H,2,max), constr=rep(TRUE,ncol(H))) ehd_pca_p3 <- ordPCA(H, p=3, lambda=lam_p1, Ks=apply(H,2,max), constr=rep(TRUE,ncol(H))) ehd_pca_p4 <- ordPCA(H, p=4, lambda=lam_p1, Ks=apply(H,2,max), constr=rep(TRUE,ncol(H)))
Note, that the choice of $\lambda$ relies on a fixed number of components p
, which must be specified before $\lambda$ is selected. One strategy for selecting an appropriate number of components, is to use the elbow of the scree plot for standard linear PCA as an initial guess.
To make sure that the choice is valid, we could then look at different scree plots when extracting different p
's in that area (here: p=1
, p=2
, p=3
, p=4
) and inserting the respective optimal $\lambda$ value:
# evaluate model with optimal lambda for p=2 ehd_pca_p2 <- ordPCA(H, p=2, lambda = lam_p2, Ks=apply(H,2,max), constr=rep(TRUE,ncol(H))) # evaluate optimal lambda & model for p=1, p=3, p=4 set.seed(456) ehd_CV_p1 <- ordPCA(H, p = 1, lambda=lambda, constr = rep(TRUE, ncol(H)), CV = TRUE, k = 5) ehd_CV_p3 <- ordPCA(H, p = 3, lambda=lambda, constr = rep(TRUE, ncol(H)), CV = TRUE, k = 5) ehd_CV_p4 <- ordPCA(H, p = 4, lambda=lambda, constr = rep(TRUE, ncol(H)), CV = TRUE, k = 5) lam_p1 <- (lambda)[which.max(apply(ehd_CV_p1$VAFtest,2,mean))] lam_p3 <- (lambda)[which.max(apply(ehd_CV_p3$VAFtest,2,mean))] lam_p4 <- (lambda)[which.max(apply(ehd_CV_p4$VAFtest,2,mean))] ehd_pca_p1 <- ordPCA(H, p=3, lambda=lam_p1, Ks=apply(H,2,max), constr=rep(TRUE,ncol(H))) ehd_pca_p3 <- ordPCA(H, p=3, lambda=lam_p1, Ks=apply(H,2,max), constr=rep(TRUE,ncol(H))) ehd_pca_p4 <- ordPCA(H, p=4, lambda=lam_p1, Ks=apply(H,2,max), constr=rep(TRUE,ncol(H))) plot(ehd_pca_p1$pca$sdev[1:10]^2, bty="n", xaxt="n", type="o", main=NULL, xlab="", pch=19, ylab="Variances", ylim=range(c(ehd_pca_p1$pca$sdev^2, prcomp(H, scale=T)$sdev^2)), col=6) lines(1:10, ehd_pca_p2$pca$sdev[1:10]^2, col = 2, type = "o", pch = 19) lines(1:10, ehd_pca_p3$pca$sdev[1:10]^2, col = 3, type = "o", pch = 19) lines(1:10, ehd_pca_p4$pca$sdev[1:10]^2, col = 4, type = "o", pch = 19) lines(1:10, prcomp(H, scale = T)$sdev[1:10]^2, col = 1, type = "o", pch = 19) legend(8, 5, legend=c("p=1","p=2","p=3","p=4","std"), col=c(6,2:4,1), lty=1, bty="n") axis(1, at = 1:10, labels = 1:10)
As can be seen for different p
values, for the ehd
data, the first two components are by far the most relevant. This can also be confirmed when viewing the scree plot for standard linear PCA.
The International Classification of Functioning, Disability and Health (ICF) core set data for chronic widespread pain (CWP) available in the ordPens
package consists of 420 observations of 67 ordinally scaled variables, each one associated with one of the following four types: 'body functions', 'body structures', 'activities and participation', and 'environmental factors'.
The latter are measured on a nine-point Likert scale ranging from −4 ‘complete barrier’ to +4 ‘complete facilitator’. All remaining factors are evaluated on a five-point Likert scale ranging from 0 ‘no problem’ to 4 ‘complete problem’.
For a detailed view see Cieza et al. (2004) and Gertheiss et al. (2011) or type ?ICFCoreSetCWP
.
Again, we need to make sure that the ordinal design matrix is coded adequately:
data(ICFCoreSetCWP) H <- ICFCoreSetCWP[, 1:67] + matrix(c(rep(1,50), rep(5,16), 1), nrow(ICFCoreSetCWP), 67, byrow = TRUE) head(H) xnames <- colnames(H)
To this point, we assumed the quantifications to increase or decrease monotonically, as in the previous example, the affected variables of the ehd
data are supposed to have consistent, negative association with depressive mood.
Monotonicity, however, is not always a reasonable assumption. For the environmental factors from the ICF, in particular, also non-monotone transformations could make sense, while for the other variables monotonicity seems reasonable.
We can simply specify for which variables a monotonicity constraint is to be applied by the logical argument constr
. Indeed, non-monotonicity is detected for the environmental factors (prefix 'e'):
icf_pca1 <- ordPCA(H, p = 2, lambda = c(5, 0.5, 0.001), maxit = 100, crit = 1e-7, qstart = NULL, Ks = c(rep(5,50), rep(9,16), 5), constr = c(rep(TRUE,50), rep(FALSE,16), TRUE), CV = FALSE, k = 5, CVfit = FALSE) icf_pca1C <- ordPCA(H, p = 2, lambda = c(5, 0.5, 0.001), maxit = 100, crit = 1e-7, qstart = NULL, Ks = c(rep(5,50), rep(9,16), 5), constr = rep(TRUE, ncol(H)), CV = FALSE, k = 5, CVfit = FALSE) plot(icf_pca1$qs[[51]][,3], type = "b", xlab = "category", ylab = "quantification", col = 1, ylim = range(icf_pca1$qs[[51]]), main = xnames[51], bty = "n", xaxt = "n") lines(icf_pca1$qs[[51]][,2], type = "b", col = 2, lty = 2, pch = 2, lwd=2) lines(icf_pca1$qs[[51]][,1], type = "b", col = 3, lty = 3, pch = 3, lwd=2) axis(1, at = 1:length(icf_pca1$qs[[51]][,1]), labels = -4:4) plot(icf_pca1C$qs[[51]][,3], type = "b", xlab = "category", ylab = "quantification", col = 1, ylim = range(icf_pca1C$qs[[51]]), main = xnames[51], bty = "n", xaxt = "n") lines(icf_pca1C$qs[[51]][,2], type = "b", col = 2, lty = 2, pch = 2, lwd=2) lines(icf_pca1C$qs[[51]][,1], type = "b", col = 3, lty = 3, pch = 3, lwd=2) axis(1, at = 1:length(icf_pca1C$qs[[51]][,1]), labels = -4:4)
It can be seen, that monotonicity constraints (right) on the environmental factors clearly distort valuable information contained in the negative categories/barriers.
Thus, in the preferred model, monotonicity constraints are only applied to variables corresponding to ‘body functions’, ‘body structures’, ‘activities and participation’.
Visualizing some variables:
wx <- which(xnames=="b265"|xnames=="d450"|xnames=="d455"|xnames=="e1101"|xnames=="e460" |xnames=="e325") xmain <- c() xmain[wx] <- list("Touch function", "Walking", "Moving around", "Drugs", "Societal attitudes", paste(c("Acquaintances,colleagues,","peers,community members"))) par(mar = c(4.1, 4.1, 3.1, 1.1)) for (j in wx){ plot(icf_pca1$qs[[j]][,3], type = "b", main = xmain[j], xlab = "category", bty = "n", ylim = range(icf_pca1$qs[[j]]), ylab = "quantification", xaxt = "n", cex.main= 1) lines(icf_pca1$qs[[j]][,2], type = "b", col = 2, lty = 2, pch = 2, lwd=2) lines(icf_pca1$qs[[j]][,1], type = "b", col = 3, lty = 3, pch = 3, lwd=2) axis(1, at = 1:length(icf_pca1$qs[[j]][,1]), labels = ((1:length(icf_pca1$qs[[j]][,1])) - c(rep(1,50), rep(5,16), 1)[j])) }
Cieza, A., G. Stucki, M. Weigl, L. Kullmann, T. Stoll, L. Kamen, N. Kostanjsek, and N. Walsh (2004). ICF Core Sets for chronic widespread pain. Journal of Rehabilitation Medicine 36(0), 63–68.
Falissard, B. (2009). psy: Various procedures used in psychometry. R package version 1.0.
Gertheiss, J., S. Hogger, C. Oberhauser, and G. Tutz (2011). Selection of ordinally scaled independent variables with applications to international classification of functioning core sets. Applied Statistics 60, 377–395.
Hoshiyar, A. (2020). Analyzing Likert-type data using penalized non-linear principal components analysis. In: Proceedings of the 35th International Workshop on Statistical Modelling I, 337-340.
Hoshiyar, A., H.A.L. Kiers, and J. Gertheiss (2021). Penalized non-linear principal components analysis for ordinal variables with an application to international classification of functioning core sets. British Journal of Mathematical and Statistical Psychology 76, 353-371.
Jouvent, R., C. Vindreau, M. Montreuil, C. Bungender, and D. Windlocher (1988). La clinique polydimensionnelle de humeur depressive: Nouvelle version echelle ehd. Psychiatrie et Psychobiologie 3, 245–254.
Linting, M., J.J. Meulmann, A.J. von der Kooji, and P.J.F. Groenen (2007). Nonlinear principal components analysis: Introduction and application. Psychological Methods 12, 336-358.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.