Ordinal and Mixed PCA
In Gifi: Multivariate Analysis with Optimal Scaling

knitr::opts_chunk$set(collapse = TRUE, comment = "#>")

This vignette shows applications of principal component analysis (PCA) on ordinal and mixed input data which in Gifi slang is called princals.

Ordinal PCA in a Nutshell

We use a dataset from @Kennett+Salini:2012 on customer satisfaction that is included in the Gifi package.

library("Gifi")
ABC6 <- ABC[,6:11]            ## we use 6 items from the dataset
head(ABC6)

We have a sample size of r nrow(ABC6). Each item is scaled on a 5-point rating scale (1 ... very low, 5 ... very high). A more detailed description can be found in the corresponding ?ABC help file. We are now going to fit an PCA with all items taken at an ordinal scale level:

fit_ordpca <- princals(ABC6, ndim = 2, levels = "ordinal")  
fit_ordpca

We obtain a loss value of r round(fit_ordpca$f,3). The results are prepared in a way so that they are close to the classical PCA output involving eigenvalues, loadings, and variance accounted for (VAF):

summary(fit_ordpca)

We can plot the loadings, or combine them with the PC scores (which Gifi calls object scores) in a biplot.

plot(fit_ordpca, plot.type = "loadplot")

plot(fit_ordpca, plot.type = "biplot", col.loadings = "coral3", col.scores = "lightgrey")
abline(h = 0, v = 0, lty = 2)

Similar to standard PCA, one can create a scree plot (plot.type = "screeplot") to assess the number of dimensions needed.

If the user wants to have a closer insight into the optimal scaling transformations under the hood, transformation plots can be produced:

plot(fit_ordpca, plot.type = "transplot")

These plots nicely show the transformations of the original 1-5 scores into the optimally scaled category quantifications. Since in the princals() fit above we defined levels = "ordinal", a monotone regression transformation is applied resulting in these step functions. The data frame with the quantifications on the first dimension can be extracted as follows:

head(fit_ordpca$scoremat)

This data frame can replace the original AB6 object for subsequent analyses.

Standard PCA with Gifi

One can also mimick a standard PCA using princals() by setting the levels argument accordingly:

fit_metpca <- princals(ABC6, ndim = 2, levels = "metric")
fit_metpca

The results are highly similar to the classical prcomp(ABC6, scale = TRUE) fit, and the same plots as above can be produced. Here we look again at the transformation plots, showing the linear transformation (simply a z-standardization) induced by the metric measurement level assumption.

plot(fit_metpca, plot.type = "transplot")

One can compare the loss value of r round(fit_ordpca$f, 3) from the ordinal solution with r round(fit_metpca$f, 3) from the metric solution and, in conjunction with the transformation plots, judge whether these Likert items can be treated as metric.

Mixed PCA

To demonstrate PCA with mixed input scale levels, we use a subset of the data based on the Wilson-Patterson conservatism scale from MPsychoR package. For illustration purposes we create age groups so that we have an ordinal variable in the dataset.

data("WilPat2")
WilPat2$Age <- cut(WilPat2$Age, breaks = c(17, 20, 23, 30, 40, 100), labels = 1:5)      
head(WilPat2)

The first 6 items are Wilson-Patterson items (1 = "approve", 0 = "disapprove", and 2 = "don't know") which we declare as "nominal". This is followed by Country which is also "nominal", and two self-reported liberalism/conservatism and left/right identification which enter as "metric". Finally we use an "ordinal" transformation for age. The corresponding levelvec object is then shipped to princals().

levelvec <- c(rep("nominal", 6), "nominal", "metric", "metric", "nominal", "ordinal")    
wen_prin <- princals(WilPat2, levels = levelvec)                                          ## fit 2D princals
wen_prin
summary(wen_prin)

Let us look at the transformation plots of some variables:

plot(wen_prin, plot.type = "transplot", var.subset = c(1, 8, 9, 11))

Let's produce the joint plot as the main output which combines the loadings for the non-nominal variables with the category quantifications of the nominal variables. For aesthetic reasons we re-scale the loading arrows by a factor of 0.08.

plot(wen_prin, plot.type = "jointplot", expand = 0.08)

In case we are interested in the object scores, we can ask for an object plot:

plot(wen_prin, "objplot", cex.scores = 0.6)

A biplot could also be produced that combines the loadings plot with the object plot.

Note that princals is a restricted version of homals: in princals, the category quantifications on the second dimension are linearly related to the ones from dimension 1, whereas in homals they are unrestricted. In the multiple correspondence analysis vignette we call homals() on the same data. Whether to use princals or homals on datasets like these, depends on what we are mainly interest in: category quantifications of the nominal variables (homals) or loadings of the non-nominal variables (princals). This said, as shown in Sec. 8.3.3 in @Mair:2018b, using the concept of copies one can obtain the same results in princals as with a homals fit. Eventually, it's all the same (Gifi) monster.

Further technical details can be found in vignette("gifi_theory"). Elaborations on optimal scaling, further applied aspects, and examples are given in @Mair:2018b and in the excellent paper by @Linting+Meulman:2007.