Introduction to ordPens: `ordPCA`

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",  
  out.extra = 'style="border:0; margin: auto"', 
fig.width=5,
fig.height=5,
out.width="400px",  
out.height="400px"  
)

Primary functions

The ordPens R package offers selection, and/or smoothing/fusing of ordinally scaled independent variables using a group lasso or generalized ridge penalty. In addition, nonlinear principal components analysis for ordinal variables is provided, using a second-order difference penalty.

For Nonlinear PCA, performance evaluation and selection of an optimal penalty parameter, use ordPCA(). Smoothing and selection of ordinal predictors is done by the function ordSelect(); smoothing only, by ordSmooth(); fusion and selection of ordinal predictors by ordFusion(). For ANOVA with ordinal factors, use ordAOV(). To test for genes that are differentially expressed between ordinal levels, use ordGene(). See also vignette("ordPens", package = "ordPens").

library(ordPens)

Penalized nonlinear Principal Components Analysis

EHD data example

ordPens offers, inter alia, nonlinear principal components analysis for ordinal variables. To explore the tools available for ordinal PCA, we’ll use some publicly available data sets. The ehd data from R package psy (Falissard, 2009) contains 269 observations of 20 ordinally scaled variables forming a polydimensional rating scale of depressive mood (Jouvent et al., 1988) and is documented in ?ehd.

To ensure adequate coding, we need to add +1 to the observed values.

library(psy) 
data(ehd)
H <- ehd + 1
head(H)

The wrapper function ordPCA() handles both matrices or data frames of integers 1,2,..., giving the observed levels of the ordinal variables, as its first argument. The columns are by no means limited to have the same number of levels and can be specified by the argument Ks. Basically the algorithm converts the ordinal data to interval scale by finding optimal scores (known as optimal scoring/scaling in the context of nonlinear PCA, cf. Linting et al., 2007). Optimization, i.e. finding optimal quantifications/scores and principal components with corresponding loadings, is done via alternating least squares.

In order to take into account the ordinal character of the data, penalized ALS is applied where the amount of penalty can be controlled by the smoothing parameter lambda $\in \mathbb{R}_0^{+}$ using a second-order difference penalty. In addition, the function provides the option of both non-monotone effects and incorporating constraints enforcing monotonicity.

One dimensional estimation

Let’s start with the most basic example with univariate penalty, i.e. lambda being a scalar. We extract 2 principal components, specified by the argument p.

ehd_pca1 <- ordPCA(H, p = 2, lambda = 0.5, maxit = 100, crit = 1e-7,
                   qstart = NULL, Ks = apply(H, 2, max), constr = rep(TRUE, ncol(H)),
                   CV = FALSE, k = 5, CVfit = FALSE)

Extract the typical PCA summary (in alignment with base::prcomp()):

summary(ehd_pca1$pca)

Estimated quantifications:

ehd_pca1$qs

We can visualize the estimates, e.g., for Variable e9, by selecting the corresponding list entry:

plot(1:5, ehd_pca1$qs[[9]], type = "b", xlab = "category", ylab = "quantification", 
     col = 2, main = colnames(H)[9], bty = "n")

Comparison of different penalty parameter values

We can apply the function for different values of the penalty parameter. As we specify lambda as a (decreasing) vector, the output will result in a list of multivariate matrices. Note that optimization starts with the first component of lambda. Thus, if lambda is not in decreasing order, the vector will be sorted internally and so will be corresponding results.

ehd_pca2 <- ordPCA(H, p = 2, lambda = c(5, 0.5, 0), maxit = 100, crit = 1e-7,
                   qstart = NULL, Ks = apply(H, 2, max), constr = rep(TRUE, ncol(H)),
                   CV = FALSE)

ehd_pca2$pca: List of length corresponding to lambda, e.g. for $\lambda=0$:

summary(ehd_pca2$pca[[1]])

ehd_pca2$qs: List of matrices (of dimension Ks x length(lambda)) with entries corresponding to the variables, e.g. for the second variable (e9):

ehd_pca2$qs[[9]]

With an increasing penalty parameter, quantifications become increasingly linear:

plot(ehd_pca2$qs[[9]][,3], type = "b", xlab = "category", ylab = "quantification", col = 1, 
     ylim = range(ehd_pca2$qs[[9]]), main = colnames(H)[9], bty = "n")
lines(ehd_pca2$qs[[9]][,2], type = "b", col = 2, lty = 2, pch = 2, lwd=2)
lines(ehd_pca2$qs[[9]][,1], type = "b", col = 3, lty = 3, pch = 3, lwd=2)

Visualizing some variables:

par(mar = c(4.1, 4.1, 3.1, 1.1))

for(j in c(1, 9, 12, 13, 15, 19)){ 
  plot(ehd_pca2$qs[[j]][,3], type = "b", main = colnames(H)[j], xlab = "category", 
       ylab = "quantification", lwd = 2, bty = "n") 
  lines(ehd_pca2$qs[[j]][,2], type = "b", col = 2, lty = 2, pch = 2, lwd = 2)
  lines(ehd_pca2$qs[[j]][,1], type = "b", col = 3, lty = 3, pch = 3, lwd = 2)
} 

Comparison of VAF for different penalties

We can also compare different amounts of penalization by means of variance accounted for (VAF) for performance evaluation by setting CV = TRUE and specifying the number of folds k. The resulting output is a matrix with columns corresponding to lambda and rows corresponding to the folds k. Use CVfit = TRUE to perform both k-fold cross validation and estimation (which, however, can be time-consuming if lambda has many elements).

ehd_pca3 <- ordPCA(H, p = 2, lambda = c(5, 0.5, 0.001), maxit = 100, crit = 1e-7,
                   qstart = NULL, Ks = apply(H, 2, max), constr = rep(TRUE, ncol(H)),
                   CV = TRUE, k = 5)

ehd_pca3$VAFtest

k-fold Cross Validation

For selecting the right amount of penalization, however, k-fold cross validation should be performed over a fine grid of (many) sensible values $\lambda$. Due to time-consuming computation and undesireably high dimensions of outputs, we recommend to set the default CVfit = FALSE. By doing so, the function only stores VAF values for both the training set and the validation set.
In addiction, the function provides the option of both non-monotone effects and incorporating constraints enforcing monotonicity, specified by the logical argument constr. For the ehd data the assumption of monotonic effects seems reasonable.

lambda <- 10^seq(4, -4, by = -0.1)
set.seed(456)
ehd_CV_p2 <- ordPCA(H, p = 2, lambda = lambda, maxit = 100, crit = 1e-7, Ks = apply(H, 2, max),
                   qstart = NULL, constr = rep(TRUE, ncol(H)), CV = TRUE, k = 5, CVfit = FALSE)

lam_p2 <- (lambda)[which.max(apply(ehd_CV_p2$VAFtest,2,mean))]
ehd_CV_p2$VAFtest

The optimal $\lambda$ can then be determined by the value maximizing the cross-validated VAF on the validation set, averaged over folds:

plot(log10(lambda), apply(ehd_CV_p2$VAFtrain,2,mean), type = "l",
     xlab = expression(log[10](lambda)), ylab = "proportion of variance explained",
     main = "training data", cex.axis = 1.2, cex.lab = 1.2, cex.main = 1.4)
plot(log10(lambda), apply(ehd_CV_p2$VAFtest,2,mean), type = "l",
     xlab = expression(log[10](lambda)), ylab = "proportion of variance explained",
     main = "validation data", cex.axis = 1.2, cex.lab = 1.2, cex.main = 1.4)
abline(v = log10(lambda)[which.max(apply(ehd_CV_p2$VAFtest,2,mean))])

Selecting the number of components

Note, that the choice of $\lambda$ relies on a fixed number of components p, which must be specified before $\lambda$ is selected. One strategy for selecting an appropriate number of components, is to use the elbow of the scree plot for standard linear PCA as an initial guess. To make sure that the choice is valid, we could then look at different scree plots when extracting different p's in that area (here: p=1, p=2, p=3, p=4) and inserting the respective optimal $\lambda$ value:

# evaluate model with optimal lambda for p=2
ehd_pca_p2 <- ordPCA(H, p=2, lambda = lam_p2, Ks=apply(H,2,max), constr=rep(TRUE,ncol(H)))

# evaluate optimal lambda & model for p=1, p=3, p=4
set.seed(456)
ehd_CV_p1 <- ordPCA(H, p = 1, lambda=lambda, constr = rep(TRUE, ncol(H)), CV = TRUE, k = 5)
ehd_CV_p3 <- ordPCA(H, p = 3, lambda=lambda, constr = rep(TRUE, ncol(H)), CV = TRUE, k = 5)
ehd_CV_p4 <- ordPCA(H, p = 4, lambda=lambda, constr = rep(TRUE, ncol(H)), CV = TRUE, k = 5)

lam_p1 <- (lambda)[which.max(apply(ehd_CV_p1$VAFtest,2,mean))]
lam_p3 <- (lambda)[which.max(apply(ehd_CV_p3$VAFtest,2,mean))]
lam_p4 <- (lambda)[which.max(apply(ehd_CV_p4$VAFtest,2,mean))]

ehd_pca_p1 <- ordPCA(H, p=3, lambda=lam_p1, Ks=apply(H,2,max), constr=rep(TRUE,ncol(H)))
ehd_pca_p3 <- ordPCA(H, p=3, lambda=lam_p1, Ks=apply(H,2,max), constr=rep(TRUE,ncol(H)))
ehd_pca_p4 <- ordPCA(H, p=4, lambda=lam_p1, Ks=apply(H,2,max), constr=rep(TRUE,ncol(H)))

Selecting the number of components

Note, that the choice of $\lambda$ relies on a fixed number of components p, which must be specified before $\lambda$ is selected. One strategy for selecting an appropriate number of components, is to use the elbow of the scree plot for standard linear PCA as an initial guess. To make sure that the choice is valid, we could then look at different scree plots when extracting different p's in that area (here: p=1, p=2, p=3, p=4) and inserting the respective optimal $\lambda$ value:

# evaluate model with optimal lambda for p=2
ehd_pca_p2 <- ordPCA(H, p=2, lambda = lam_p2, Ks=apply(H,2,max), constr=rep(TRUE,ncol(H)))

# evaluate optimal lambda & model for p=1, p=3, p=4
set.seed(456)
ehd_CV_p1 <- ordPCA(H, p = 1, lambda=lambda, constr = rep(TRUE, ncol(H)), CV = TRUE, k = 5)
ehd_CV_p3 <- ordPCA(H, p = 3, lambda=lambda, constr = rep(TRUE, ncol(H)), CV = TRUE, k = 5)
ehd_CV_p4 <- ordPCA(H, p = 4, lambda=lambda, constr = rep(TRUE, ncol(H)), CV = TRUE, k = 5)

lam_p1 <- (lambda)[which.max(apply(ehd_CV_p1$VAFtest,2,mean))]
lam_p3 <- (lambda)[which.max(apply(ehd_CV_p3$VAFtest,2,mean))]
lam_p4 <- (lambda)[which.max(apply(ehd_CV_p4$VAFtest,2,mean))]

ehd_pca_p1 <- ordPCA(H, p=3, lambda=lam_p1, Ks=apply(H,2,max), constr=rep(TRUE,ncol(H)))
ehd_pca_p3 <- ordPCA(H, p=3, lambda=lam_p1, Ks=apply(H,2,max), constr=rep(TRUE,ncol(H)))
ehd_pca_p4 <- ordPCA(H, p=4, lambda=lam_p1, Ks=apply(H,2,max), constr=rep(TRUE,ncol(H)))

plot(ehd_pca_p1$pca$sdev[1:10]^2, bty="n",  xaxt="n", type="o", main=NULL, xlab="", pch=19,
     ylab="Variances", ylim=range(c(ehd_pca_p1$pca$sdev^2, prcomp(H, scale=T)$sdev^2)), col=6)
lines(1:10, ehd_pca_p2$pca$sdev[1:10]^2, col = 2, type = "o", pch = 19)
lines(1:10, ehd_pca_p3$pca$sdev[1:10]^2, col = 3, type = "o", pch = 19)
lines(1:10, ehd_pca_p4$pca$sdev[1:10]^2, col = 4, type = "o", pch = 19)
lines(1:10, prcomp(H, scale = T)$sdev[1:10]^2, col = 1, type = "o", pch = 19)
legend(8, 5, legend=c("p=1","p=2","p=3","p=4","std"), col=c(6,2:4,1), lty=1, bty="n")
axis(1, at = 1:10, labels = 1:10)  

As can be seen for different p values, for the ehd data, the first two components are by far the most relevant. This can also be confirmed when viewing the scree plot for standard linear PCA.

d4b1b5e8bdb2525c677c2bebababe2e56804a737

plot(ehd_pca_p1$pca$sdev[1:10]^2, bty="n", xaxt="n", type="o", main=NULL, xlab="", pch=19, ylab="Variances", ylim=range(c(ehd_pca_p1$pca$sdev^2, prcomp(H, scale=T)$sdev^2)), col=6) lines(1:10, ehd_pca_p2$pca$sdev[1:10]^2, col = 2, type = "o", pch = 19) lines(1:10, ehd_pca_p3$pca$sdev[1:10]^2, col = 3, type = "o", pch = 19) lines(1:10, ehd_pca_p4$pca$sdev[1:10]^2, col = 4, type = "o", pch = 19) lines(1:10, prcomp(H, scale = T)$sdev[1:10]^2, col = 1, type = "o", pch = 19) legend(8, 5, legend=c("p=1","p=2","p=3","p=4","std"), col=c(6,2:4,1), lty=1, bty="n") axis(1, at = 1:10, labels = 1:10)

As can be seen for different `p` values, for the `ehd` data, the first two components are by far the most relevant. This can also be confirmed when viewing the scree plot for standard linear PCA.  


### ICF data example
#### Monotonicity assumptions
The International Classification of Functioning, Disability and Health (ICF) core set data for chronic widespread pain (CWP) available in the `ordPens` package consists of 420 observations of 67 ordinally scaled variables, each one associated with one of the following four types: 'body functions', 'body structures', 'activities and participation', and 'environmental factors'. 
The latter are measured on a nine-point Likert scale ranging from −4 ‘complete barrier’ to +4 ‘complete facilitator’. All remaining factors are evaluated on a five-point Likert scale ranging from 0 ‘no problem’ to 4 ‘complete problem’.
For a detailed view see Cieza et al. (2004) and Gertheiss et al. (2011) or type `?ICFCoreSetCWP`.

Again, we need to make sure that the ordinal design matrix is coded adequately:
```r
data(ICFCoreSetCWP) 
H <- ICFCoreSetCWP[, 1:67] + matrix(c(rep(1,50), rep(5,16), 1), 
                                    nrow(ICFCoreSetCWP), 67, byrow = TRUE)
head(H)
xnames <- colnames(H)

To this point, we assumed the quantifications to increase or decrease monotonically, as in the previous example, the affected variables of the ehd data are supposed to have consistent, negative association with depressive mood.

Monotonicity, however, is not always a reasonable assumption. For the environmental factors from the ICF, in particular, also non-monotone transformations could make sense, while for the other variables monotonicity seems reasonable.

We can simply specify for which variables a monotonicity constraint is to be applied by the logical argument constr. Indeed, non-monotonicity is detected for the environmental factors (prefix 'e'):

icf_pca1 <- ordPCA(H, p = 2, lambda = c(5, 0.5, 0.001), maxit = 100, crit = 1e-7, qstart = NULL, 
                   Ks = c(rep(5,50), rep(9,16), 5), 
                   constr = c(rep(TRUE,50), rep(FALSE,16), TRUE), 
                   CV = FALSE, k = 5, CVfit = FALSE) 

icf_pca1C <- ordPCA(H, p = 2, lambda = c(5, 0.5, 0.001), maxit = 100, crit = 1e-7, qstart = NULL, 
                    Ks = c(rep(5,50), rep(9,16), 5), constr = rep(TRUE, ncol(H)), 
                    CV = FALSE, k = 5, CVfit = FALSE) 

plot(icf_pca1$qs[[51]][,3], type = "b", xlab = "category", ylab = "quantification", col = 1, 
     ylim = range(icf_pca1$qs[[51]]), main = xnames[51], bty = "n", xaxt = "n") 
lines(icf_pca1$qs[[51]][,2], type = "b", col = 2, lty = 2, pch = 2, lwd=2)
lines(icf_pca1$qs[[51]][,1], type = "b", col = 3, lty = 3, pch = 3, lwd=2)
axis(1, at = 1:length(icf_pca1$qs[[51]][,1]), labels = -4:4)   

plot(icf_pca1C$qs[[51]][,3], type = "b", xlab = "category", ylab = "quantification", col = 1, 
     ylim = range(icf_pca1C$qs[[51]]), main = xnames[51], bty = "n", xaxt = "n")  
lines(icf_pca1C$qs[[51]][,2], type = "b", col = 2, lty = 2, pch = 2, lwd=2)
lines(icf_pca1C$qs[[51]][,1], type = "b", col = 3, lty = 3, pch = 3, lwd=2)
axis(1, at = 1:length(icf_pca1C$qs[[51]][,1]), labels = -4:4)   

It can be seen, that monotonicity constraints (right) on the environmental factors clearly distort valuable information contained in the negative categories/barriers.

Thus, in the preferred model, monotonicity constraints are only applied to variables corresponding to ‘body functions’, ‘body structures’, ‘activities and participation’.

Visualizing some variables:

wx <- which(xnames=="b265"|xnames=="d450"|xnames=="d455"|xnames=="e1101"|xnames=="e460"
            |xnames=="e325") 
xmain <- c()
xmain[wx] <- list("Touch function",
                  "Walking",
                  "Moving around",
                  "Drugs",
                  "Societal attitudes",
                   paste(c("Acquaintances,colleagues,","peers,community members")))

par(mar = c(4.1, 4.1, 3.1, 1.1))
for (j in wx){
plot(icf_pca1$qs[[j]][,3], type = "b", main = xmain[j], xlab = "category", bty = "n", 
     ylim = range(icf_pca1$qs[[j]]), ylab = "quantification", xaxt = "n", cex.main= 1)  
lines(icf_pca1$qs[[j]][,2], type = "b", col = 2, lty = 2, pch = 2, lwd=2)
lines(icf_pca1$qs[[j]][,1], type = "b", col = 3, lty = 3, pch = 3, lwd=2)
axis(1, at = 1:length(icf_pca1$qs[[j]][,1]), 
     labels = ((1:length(icf_pca1$qs[[j]][,1])) - c(rep(1,50), rep(5,16), 1)[j]))   
}

References



Try the ordPens package in your browser

Any scripts or data that you put into this service are public.

ordPens documentation built on Oct. 10, 2023, 5:07 p.m.