knitr::opts_chunk$set(echo = TRUE,eval=TRUE,fig.align="center",fig.width = 7,fig.height = 7)
The datatable housing is used.
library(PCAmixdata) data(gironde) head(gironde$housing)
The PCAmix function is applied to this datatable.
split <- splitmix(gironde$housing) X1 <- split$X.quanti X2 <- split$X.quali res.pcamix <- PCAmix(X.quanti=X1, X.quali=X2,rename.level=TRUE, graph=FALSE)
The object res.pcamix is an object of class PCAmix and contains many numerical results summarized in the S3 print method.
res.pcamix
The variance of the principal components is obtained.
res.pcamix$eig
Here are some graphical outputs obtained with the method plot of the objects of class PCAmix.
?plot.PCAmix par(mfrow=c(2,2)) plot(res.pcamix,choice="ind",coloring.ind=X2$houses,label=FALSE, posleg="bottomright", main="Observations") plot(res.pcamix,choice="levels",xlim=c(-1.5,2.5), main="Levels") plot(res.pcamix,choice="cor",main="Numerical variables") plot(res.pcamix,choice="sqload",coloring.var=T, leg=TRUE, posleg="topright", main="All variables")
It is also possible to select some observations to plot.
sort(res.pcamix$ind$coord[,2])[1:10] plot(res.pcamix,choice="ind",lim.contrib.plot = 2, posleg="bottomright", main="Observations",cex=0.8)
We select randomly 100 observations to build a test sample. The remaining observations are in a train sample used to perform PCAmix.
set.seed(10) test <- sample(1:nrow(gironde$housing),100) train.pcamix <- PCAmix(X1[-test,],X2[-test,],ndim=2,graph=FALSE)
The scores of the 100 observations are obtained with the S3 method predict of the objects of class PCAmix.
?predict.PCAmix pred <- predict(train.pcamix,X1[test,],X2[test,]) head(pred)
It is then possible to plot these observation as supplementary on the principal component map.
plot(train.pcamix,axes=c(1,2),label=FALSE,main="Observations map") points(pred,col=2,pch=16) legend("bottomright",legend = c("train","test"),fill=1:2,col=1:2)
We choose as supplementary variables: * the numerical variable building of the datatable environment * the categorical variable doctor of the datatable services And we use the S3 method supvar of the objects of class PCAmix to obtain the coordinates of this supplmentary numerical on the correlation circle of PCAmix as well as the coordinates of the levels of this supplementary categorical variable.
?supvar.PCAmix X1sup <- gironde$environment[,1,drop=FALSE] X2sup <- gironde$services[,7,drop=FALSE] res.sup <- supvar(res.pcamix,X1sup,X2sup,rename.level=TRUE) res.sup$quanti.sup$coord[,1:2,drop=FALSE] res.sup$levels.sup$coord[,1:2]
The S3 method plot plots automatically these supplementary variables on all the graphical outputs.
?plot.PCAmix #par(mfrow=c(2,1)) plot(res.sup,choice="cor",main="Numerical variables") plot(res.sup,choice="levels",main="Levels",xlim=c(-2,2.5))
We apply here the function PCArot to perform varimax-type rotation to the result of PCAmix applied to the datatable housing. The 10 first observations are removed to illustrate later the prediction of new observations.
library(PCAmixdata) data(gironde) #Create the datatable housing without the ten first observations train <- gironde$housing[-c(1:10), ] #Split the datatable split <- splitmix(train) X1 <- split$X.quanti X2 <- split$X.quali res.pcamix <- PCAmix(X.quanti=X1, X.quali=X2,rename.level=TRUE, graph=FALSE)
In order to choose the number of dimension to rotate, we look at the inertia of the principal components again.
res.pcamix$eig
We decide to rotate 3 principal components that summarize 84% of the inertia.
res.pcarot <- PCArot(res.pcamix,dim=3,graph=FALSE) res.pcarot$eig #variance of the rotated PCs sum(res.pcarot$eig[,2]) #unchanged explained inertia
The object res.pcarot contains many numerical results summarized in the S3 print method.
res.pcarot
We can look at the squared loadings before and after rotation.
round(res.pcamix$sqload[,1:3],digit=2) round(res.pcarot$sqload,digit=2)
It is also possible to plot all the standard maps of PCAmix before and after rotation. For instance we can plot the observations and variables before and after rotation in the dimensions 1 and 3.
par(mfrow=c(2,2)) plot(res.pcamix,choice="ind",label=FALSE,axes=c(1,3), main="Observations before rotation") plot(res.pcarot,choice="ind",label=FALSE,axes=c(1,3), main="Observations after rotation") plot(res.pcamix,choice="sqload", coloring.var=TRUE, leg=TRUE,axes=c(1,3), posleg="topleft", main="Variables before rotation", xlim=c(0,1), ylim=c(0,1)) plot(res.pcarot,choice="sqload", coloring.var=TRUE, leg=TRUE,axes=c(1,3), posleg="topright", main="Variables after rotation", xlim=c(0,1), ylim=c(0,1))
It also possible to plot the numerical variables (on the correlation circle) or the levels of the categorical variables.
par(mfrow=c(2,2)) plot(res.pcamix,choice="cor", coloring.var=TRUE, leg=TRUE,axes=c(1,3), posleg="topright", main="Before rotation") plot(res.pcarot,choice="cor", coloring.var=TRUE, leg=TRUE,axes=c(1,3), posleg="topright", main="After rotation") plot(res.pcamix,choice="levels", leg=TRUE,axes=c(1,3), posleg="topright", main="Before rotation",xlim=c(-1.5,2.5)) plot(res.pcarot,choice="levels", leg=TRUE,axes=c(1,3), posleg="topright", main="After rotation",xlim=c(-1.5,2.5))
It is possible, exactly as in PCAmix, to predict the scores of new observations on the principal components after rotation.
test <- gironde$housing[1:10, ] splitnew <- splitmix(test) X1new <- splitnew$X.quanti X2new <- splitnew$X.quali pred.rot <- predict(res.pcarot, X.quanti=X1new, X.quali=X2new) pred.rot
An of course these new observations can be plot as supplementary observations on the principal component map of the observations after rotation.
plot(res.pcarot,axes=c(1,3),label=FALSE,main="Observations map after rotation") points(pred.rot[,c(1,3)],col=2,pch=16) legend("topright",legend = c("train","test"),fill=1:2,col=1:2)
We use here the 4 mixed datatables available in the dataset gironde. This dataset is a list of 4 datatables (one datatable by group).
library(PCAmixdata) data(gironde) names(gironde)
Then we apply the function MFAmix to the four datatables concatenated in a single datatable.
#Concatenation of the 4 datatables dat <- cbind(gironde$employment,gironde$housing,gironde$services,gironde$environment) index <- c(rep(1,9),rep(2,5),rep(3,9),rep(4,4)) names <- c("employment","housing","services","environment") res.mfamix<-MFAmix(data=dat,groups=index, name.groups=names,ndim=3,rename.level=TRUE,graph=FALSE)
The object res.mfamix is an object of class MFAmix and contains many numerical results summarized in the S3 print method.
class(res.mfamix)
res.mfamix
The graphical outputs of MFAmix are quite similar to those of PCAmix with some specifity introduced by the knowedge of groups of variables. They are obtained with the method plot of the objects of class MFAmix.
?plot.MFAmix par(mfrow=c(2,2)) plot(res.mfamix, choice="cor",coloring.var="groups",leg=TRUE, main="(a) Numerical variables") plot(res.mfamix,choice="ind", partial=c("SAINTE-FOY-LA-GRANDE"), label=TRUE, posleg="topright", main="(b) Observations") plot(res.mfamix,choice="sqload",coloring.var="groups", posleg="topright",main="(c) All variables") plot(res.mfamix, choice="groups", coloring.var="groups", main="(c) Groups")
Because the correlation circle can be difficult to read with the superposition of the names of some variables, it can be useful to look at the numerical values the most correlated (in absloute value) to the 2 first principal components of PCAmix.
coord.var <- res.mfamix$quanti$coord[,1:2] subset(coord.var,abs(coord.var[,1])>0.5,1) subset(coord.var,abs(coord.var[,2 ])>0.5,2)
In order to have details on the way the variables of the group services (in blue on the map of all the variables) interpret the first principal component, we look at the map the levels of the categorical variables;
plot(res.mfamix, choice="levels", coloring.var="groups", posleg="bottomleft", main="(c) Levels",xlim=c(-2,4))
We want to put the municipality SAINTE-FOY-LA-GRANDE as supplementary observation in the MFAmix analysis.
sel <- which(rownames(dat)=="SAINTE-FOY-LA-GRANDE") res.mfamix<- MFAmix(data=dat[-sel,],groups=index, name.groups=names,rename.level=TRUE,graph=FALSE) pred <- predict(res.mfamix,dat[sel,,drop=FALSE]) pred[,1:2,drop=FALSE]
It is then possible to plot SAINTE-FOY-LA-GRANDE as an illustrative municipality.
plot(res.mfamix,axes=c(1,2),label=FALSE,main="Observations map", ylim=c(-5,1.5),cex=0.8) points(pred[,1:2,drop=FALSE],col=2,pch=16) text(pred,"SAINTE-FOY-LA-GRANDE",col=2,cex=0.7,pos=2) selplot <- which(res.mfamix$ind$coord[,1]>4.2) text(res.mfamix$ind$coord[selplot,1:2],rownames(dat[-sel,])[selplot],col=1,cex=0.7,pos=2) legend("topright",legend = c("active","illustrative"),fill=1:2,col=1:2)
Let us for instance apply MFAmix with three groups (employment , services, environment) and add the group housing in supplementary.
dat <- cbind(gironde$employment,gironde$services,gironde$environment) names <- c("employment","services","environment") mfa <-MFAmix(data=dat,groups=c(rep(1,9),rep(2,9),rep(3,4)), name.groups=names, rename.level=TRUE,graph=FALSE) mfa.sup <- supvar(mfa,data.sup=gironde$housing, groups.sup=rep(1,5), name.groups.sup="housing.sup",rename.level=TRUE)
The group housing can then plotted as supplementary on the maps of MFAmix.
#par(mfrow=c(1,2)) plot(mfa.sup,choice="groups",coloring.var="groups") plot(mfa.sup,choice="cor",coloring.var = "groups")
Chavent, M., Kuentz-Simonet, V., Labenne, A., Saracco, J., Multivariate analysis of mixed data: The PCAmixdata R package, arXiv preprint.
Chavent, M., Kuentz, V., Saracco, J. (2011). Orthogonal Rotation in PCAMIX. Advances in Classification and Data Analysis, Vol. 6, pp. 131-146.
Kiers, H.A.L., (1991). Simple structure in Component Analysis Techniques for mixtures of qualitative and quantitative variables, Psychometrika, 56, 197-212.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.