sparsePCAmix: Sparse principal component analysis of mixed data

sparsePCAmixR Documentation

Sparse principal component analysis of mixed data

Description

Performs sparse principal component analysis of a set of individuals (observations) described by a mixture of qualitative and quantitative variables. sparsePCAmix includes ordinary sparse principal component analysis (PCA) and sparse multiple correspondence analysis (MCA) as special cases.

Usage

sparsePCAmix(
  X.quanti = NULL,
  X.quali = NULL,
  m = 2,
  lambda,
  block = 1,
  mu = 1/1:m,
  groupsize = FALSE,
  rename.level = FALSE
)

Arguments

X.quanti

a numeric matrix of data.

X.quali

a categorical matrix of data.

m

number of sparse components.

lambda

a vector of dimension m with reduced sparsity parameters (in relative value with respect to the theoretical upper bound). Each reduced sparsity parameter is a value between 0 and 1.

block

either 0 or 1. block==0 means that deflation is used if more than one component need to be computed. A block algorithm is otherwise used, that computes m components at once. By default, block=1.

mu

vector of dimension m with the mu parameters (required for the block algorithms only). By default, mu_j=1/j

groupsize

a logical value indicating wheter the size of the groups should be taken into account.

rename.level

boolean, if TRUE all the levels of the qualitative variables are renamed as follows: "variable_name=level_name". This prevents to have identical names of the levels.

Details

The pre-processed data matrix Z is X.quanti standardized (centered and reduced by standard deviations) concatenated with the indicator matrix of X.quali centered. The principal components (the scores) are given by the matrix F=ZMV where M=diag(w) is the diagonal metric of the weights of the columns of Z. In sparse PCA, the loadings are not necessarly orthogonal and the principal components can be correlated. The definition of the variance explained by each principal components must then be modified. Here the pev (proportion of explained variance) of the PCs is calculated with the 'optimal variance' (optVar) definition of 'explained variance'.

Value

V

the p times m matrix that contains the m sparse loading vectors

scores

the n times m matrix that contains the m principal components

pev

the proportion of variance (calculated with 'optVar') explained by the components.

varsel

a list with the name of the variables selected in each dimension.

degsp

the degree of sparsity of each component (number of selected variables).

Z

the pre-processed data matrix

w

the vector of the weights of the columns of Z.

References

  • M. Chavent and G. Chavent, Group-sparse block PCA and explained variance, arXiv:1705.00461

  • M. Chavent, V. Kuentz-Simonet, A. Labenne, J. Saracco. Multivariate Analysis of Mixed Data: The R Package PCAmixdata. Electronic Journal of Applied Statistical Analysis. ⟨hal-01662595⟩


chavent/sparsePCA documentation built on Feb. 2, 2023, 1:12 p.m.