PCAmix: Principal component analysis of mixed data

PCAmixR Documentation

Principal component analysis of mixed data

Description

Performs principal component analysis of a set of individuals (observations) described by a mixture of qualitative and quantitative variables. PCAmix includes ordinary principal component analysis (PCA) and multiple correspondence analysis (MCA) as special cases.

Usage

PCAmix(
  X.quanti = NULL,
  X.quali = NULL,
  ndim = 5,
  rename.level = FALSE,
  weight.col.quanti = NULL,
  weight.col.quali = NULL,
  graph = TRUE
)

Arguments

X.quanti

a numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).

X.quali

a categorical matrix of data, or an object that can be coerced to such a matrix (such as a character vector, a factor or a data frame with all factor columns).

ndim

number of dimensions kept in the results (by default 5).

rename.level

boolean, if TRUE all the levels of the qualitative variables are renamed as follows: "variable_name=level_name". This prevents to have identical names of the levels.

weight.col.quanti

vector of weights for the quantitative variables.

weight.col.quali

vector of the weights for the qualitative variables.

graph

boolean, if TRUE the following graphics are displayed for the first two dimensions of PCAmix: component map of the individuals, plot of the squared loadings of all the variables (quantitative and qualitative), plot of the correlation circle (if quantitative variables are available), component map of the levels (if qualitative variables are available).

Details

If X.quali is not specified (i.e. NULL), only quantitative variables are available and standard PCA is performed. If X.quanti is NULL, only qualitative variables are available and standard MCA is performed.

Missing values are replaced by means for quantitative variables and by zeros in the indicator matrix for qualitative variables.

PCAmix performs squared loadings in (sqload). Squared loadings for a qualitative variable are correlation ratios between the variable and the principal components. For a quantitative variable, squared loadings are the squared correlations between the variable and the principal components.

Note that when all the p variables are qualitative, the factor coordinates (scores) of the n observations are equal to the factor coordinates (scores) of standard MCA times square root of p and the eigenvalues are then equal to the usual eigenvalues of MCA times p. When all the variables are quantitative, PCAmix gives exactly the same results as standard PCA.

Value

eig

a matrix containing the eigenvalues, the percentages of variance and the cumulative percentages of variance.

ind

a list containing the results for the individuals (observations):

  • $coord: factor coordinates (scores) of the individuals,

  • $contrib: absolute contributions of the individuals,

  • $contrib.pct: relative contributions of the individuals,

  • $cos2: squared cosinus of the individuals.

quanti

a list containing the results for the quantitative variables:

  • $coord: factor coordinates (scores) of the quantitative variables,

  • $contrib: absolute contributions of the quantitative variables,

  • $contrib.pct: relative contributions of the quantitative variables (in percentage),

  • $cos2: squared cosinus of the quantitative variables.

levels

a list containing the results for the levels of the qualitative variables:

  • $coord: factor coordinates (scores) of the levels,

  • $contrib: absolute contributions of the levels,

  • $contrib.pct: relative contributions of the levels (in percentage),

  • $cos2: squared cosinus of the levels.

quali

a list containing the results for the qualitative variables:

  • $contrib: absolute contributions of the qualitative variables (sum of absolute contributions of the levels of the qualitative variable),

  • $contrib.pct: relative contributions (in percentage) of the qualitative variables (sum of relative contributions of the levels of the qualitative variable).

sqload

a matrix of dimension (p, ndim) containing the squared loadings of the quantitative and qualitative variables.

coef

the coefficients of the linear combinations used to construct the principal components of PCAmix, and to predict coordinates (scores) of new observations in the function predict.PCAmix.

M

the vector of the weights of the columns used in the Generalized Singular Value Decomposition.

Author(s)

Marie Chavent marie.chavent@u-bordeaux.fr, Amaury Labenne.

References

Chavent M., Kuentz-Simonet V., Labenne A., Saracco J., Multivariate analysis of mixed data: The PCAmixdata R package, arXiv:1411.4911 [stat.CO].

See Also

print.PCAmix, summary.PCAmix, predict.PCAmix, plot.PCAmix

Examples

#PCAMIX:
data(wine)
str(wine)
X.quanti <- splitmix(wine)$X.quanti
X.quali <- splitmix(wine)$X.quali
pca<-PCAmix(X.quanti[,1:27],X.quali,ndim=4)
pca<-PCAmix(X.quanti[,1:27],X.quali,ndim=4,graph=FALSE)
pca$eig
pca$ind$coord

#PCA:
data(decathlon)
quali<-decathlon[,13]
pca<-PCAmix(decathlon[,1:10])
pca<-PCAmix(decathlon[,1:10], graph=FALSE)
plot(pca,choice="ind",coloring.ind=quali,cex=0.8,
     posleg="topright",main="Scores")
plot(pca, choice="sqload",main="Squared correlations")
plot(pca, choice="cor",main="Correlation circle")
pca$quanti$coord

#MCA
data(flower)
mca <- PCAmix(X.quali=flower[,1:4], rename.level=TRUE, graph=FALSE)
plot(mca,choice="ind", main="Scores")
plot(mca,choice="sqload", main="Correlation ratios")
plot(mca,choice="levels", main="Levels")
mca$levels$coord

#Missing values
data(vnf)
PCAmix(X.quali=vnf,rename.level=TRUE)
vnf2<-na.omit(vnf)
PCAmix(X.quali=vnf2,rename.level=TRUE)

chavent/PCAmixdata documentation built on Dec. 15, 2022, 5:56 p.m.