PCAmix: Principal component analysis of mixed data

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/PCAmix.R

Description

Performs principal component analysis of a set of individuals (observations) described by a mixture of qualitative and quantitative variables. PCAmix includes ordinary principal component analysis (PCA) and multiple correspondence analysis (MCA) as special cases.

Usage

1
2
PCAmix(X.quanti = NULL, X.quali = NULL, ndim = 5, rename.level = FALSE,
  weight.col.quanti = NULL, weight.col.quali = NULL, graph = TRUE)

Arguments

X.quanti

a numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).

X.quali

a categorical matrix of data, or an object that can be coerced to such a matrix (such as a character vector, a factor or a data frame with all factor columns).

ndim

number of dimensions kept in the results (by default 5).

rename.level

boolean, if TRUE all the levels of the qualitative variables are renamed as follows: "variable_name=level_name". This prevents to have identical names of the levels.

weight.col.quanti

vector of weights for the quantitative variables.

weight.col.quali

vector of the weights for the qualitative variables.

graph

boolean, if TRUE the following graphics are displayed for the first two dimensions of PCAmix: component map of the individuals, plot of the squared loadings of all the variables (quantitative and qualitative), plot of the correlation circle (if quantitative variables are available), component map of the levels (if qualitative variables are available).

Details

If X.quali is not specified (i.e. NULL), only quantitative variables are available and standard PCA is performed. If X.quanti is NULL, only qualitative variables are available and standard MCA is performed.

Missing values are replaced by means for quantitative variables and by zeros in the indicator matrix for qualitative variables.

PCAmix performs squared loadings in (sqload). Squared loadings for a qualitative variable are correlation ratios between the variable and the principal components. For a quantitative variable, squared loadings are the squared correlations between the variable and the principal components.

Note that when all the p variables are qualitative, the factor coordinates (scores) of the n observations are equal to the factor coordinates (scores) of standard MCA times square root of p and the eigenvalues are then equal to the usual eigenvalues of MCA times p. When all the variables are quantitative, PCAmix gives exactly the same results as standard PCA.

Value

eig

a matrix containing the eigenvalues, the percentages of variance and the cumulative percentages of variance.

ind

a list containing the results for the individuals (observations):

  • $coord: factor coordinates (scores) of the individuals,

  • $contrib: absolute contributions of the individuals,

  • $contrib.pct: relative contributions of the individuals,

  • $cos2: squared cosinus of the individuals.

quanti

a list containing the results for the quantitative variables:

  • $coord: factor coordinates (scores) of the quantitative variables,

  • $contrib: absolute contributions of the quantitative variables,

  • $contrib.pct: relative contributions of the quantitative variables (in percentage),

  • $cos2: squared cosinus of the quantitative variables.

levels

a list containing the results for the levels of the qualitative variables:

  • $coord: factor coordinates (scores) of the levels,

  • $contrib: absolute contributions of the levels,

  • $contrib.pct: relative contributions of the levels (in percentage),

  • $cos2: squared cosinus of the levels.

quali

a list containing the results for the qualitative variables:

  • $contrib: absolute contributions of the qualitative variables (sum of absolute contributions of the levels of the qualitative variable),

  • $contrib.pct: relative contributions (in percentage) of the qualitative variables (sum of relative contributions of the levels of the qualitative variable).

sqload

a matrix of dimension (p, ndim) containing the squared loadings of the quantitative and qualitative variables.

coef

the coefficients of the linear combinations used to construct the principal components of PCAmix, and to predict coordinates (scores) of new observations in the function predict.PCAmix.

M

the vector of the weights of the columns used in the Generalized Singular Value Decomposition.

Author(s)

Marie Chavent [email protected], Amaury Labenne.

References

Chavent M., Kuentz-Simonet V., Labenne A., Saracco J., Multivariate analysis of mixed data: The PCAmixdata R package, arXiv:1411.4911 [stat.CO].

See Also

print.PCAmix, summary.PCAmix, predict.PCAmix, plot.PCAmix

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#PCAMIX:
data(wine)
str(wine)
X.quanti <- splitmix(wine)$X.quanti
X.quali <- splitmix(wine)$X.quali
pca<-PCAmix(X.quanti[,1:27],X.quali,ndim=4)
pca<-PCAmix(X.quanti[,1:27],X.quali,ndim=4,graph=FALSE)
pca$eig
pca$ind$coord

#PCA:
data(decathlon)
quali<-decathlon[,13]
pca<-PCAmix(decathlon[,1:10])
pca<-PCAmix(decathlon[,1:10], graph=FALSE)
plot(pca,choice="ind",coloring.ind=quali,cex=0.8,
     posleg="topright",main="Scores")
plot(pca, choice="sqload",main="Squared correlations")
plot(pca, choice="cor",main="Correlation circle")
pca$quanti$coord

#MCA
data(flower)
mca <- PCAmix(X.quali=flower[,1:4],rename.level=TRUE)
mca <- PCAmix(X.quali=flower[,1:4],rename.level=TRUE,graph=FALSE)
plot(mca,choice="ind",main="Scores")
plot(mca,choice="sqload",main="Correlation ratios")
plot(mca,choice="levels",main="Levels")
mca$levels$coord

#Missing values
data(vnf)
PCAmix(X.quali=vnf,rename.level=TRUE)
vnf2<-na.omit(vnf)
PCAmix(X.quali=vnf2,rename.level=TRUE)

Example output

'data.frame':	21 obs. of  31 variables:
 $ Label                        : Factor w/ 3 levels "Saumur","Bourgueuil",..: 1 1 2 3 1 2 2 1 3 1 ...
 $ Soil                         : Factor w/ 4 levels "Reference","Env1",..: 2 2 2 3 1 1 1 2 2 3 ...
 $ Odor.Intensity.before.shaking: num  3.07 2.96 2.86 2.81 3.61 ...
 $ Aroma.quality.before.shaking : num  3 2.82 2.93 2.59 3.43 ...
 $ Fruity.before.shaking        : num  2.71 2.38 2.56 2.42 3.15 ...
 $ Flower.before.shaking        : num  2.28 2.28 1.96 1.91 2.15 ...
 $ Spice.before.shaking         : num  1.96 1.68 2.08 2.16 2.04 ...
 $ Visual.intensity             : num  4.32 3.22 3.54 2.89 4.39 ...
 $ Nuance                       : num  4 3 3.39 2.79 4.04 ...
 $ Surface.feeling              : num  3.27 2.81 3 2.54 3.38 ...
 $ Odor.Intensity               : num  3.41 3.37 3.25 3.16 3.54 ...
 $ Quality.of.odour             : num  3.31 3 2.93 2.88 3.36 ...
 $ Fruity                       : num  2.88 2.56 2.77 2.39 3.16 ...
 $ Flower                       : num  2.32 2.44 2.19 2.08 2.23 ...
 $ Spice                        : num  1.84 1.74 2.25 2.17 2.15 ...
 $ Plante                       : num  2 2 1.75 2.3 1.76 ...
 $ Phenolic                     : num  1.65 1.38 1.25 1.48 1.6 ...
 $ Aroma.intensity              : num  3.26 2.96 3.08 2.54 3.62 ...
 $ Aroma.persistency            : num  2.96 2.81 2.8 2.58 3.3 ...
 $ Aroma.quality                : num  3.2 2.93 3.08 2.48 3.46 ...
 $ Attack.intensity             : num  2.96 3.04 3.22 2.7 3.46 ...
 $ Acidity                      : num  2.11 2.11 2.18 3.18 2.57 ...
 $ Astringency                  : num  2.43 2.18 2.25 2.19 2.54 ...
 $ Alcohol                      : num  2.5 2.65 2.64 2.5 2.79 ...
 $ Balance                      : num  3.25 2.93 3.32 2.33 3.46 ...
 $ Smooth                       : num  2.73 2.5 2.68 1.68 3.04 ...
 $ Bitterness                   : num  1.93 1.93 2 1.96 2.07 ...
 $ Intensity                    : num  2.86 2.89 3.07 2.46 3.64 ...
 $ Harmony                      : num  3.14 2.96 3.14 2.04 3.64 ...
 $ Overall.quality              : num  3.39 3.21 3.54 2.46 3.74 ...
 $ Typical                      : num  3.25 3.04 3.18 2.25 3.44 ...
        Eigenvalue  Proportion Cumulative
dim 1  14.12902429 44.15320091   44.15320
dim 2   6.11105247 19.09703896   63.25024
dim 3   2.57327334  8.04147918   71.29172
dim 4   2.05569005  6.42403142   77.71575
dim 5   1.43368245  4.48025766   82.19601
dim 6   1.17497297  3.67179052   85.86780
dim 7   0.95790392  2.99344975   88.86125
dim 8   0.83822418  2.61945055   91.48070
dim 9   0.58804867  1.83765210   93.31835
dim 10  0.55460417  1.73313802   95.05149
dim 11  0.42551449  1.32973278   96.38122
dim 12  0.39132634  1.22289482   97.60412
dim 13  0.24992995  0.78103110   98.38515
dim 14  0.15837938  0.49493555   98.88008
dim 15  0.11421382  0.35691817   99.23700
dim 16  0.07888685  0.24652141   99.48352
dim 17  0.06783051  0.21197034   99.69549
dim 18  0.04394092  0.13731539   99.83281
dim 19  0.03908925  0.12215391   99.95496
dim 20  0.01441199  0.04503746  100.00000
            dim 1      dim 2      dim 3       dim 4
2EL    0.06765317 -1.5372270 -2.1571106  1.78969962
1CHA  -3.70997680 -2.3772495 -2.9183958  1.88942732
1FON  -2.11621804 -1.8569122 -1.3538537 -2.50993120
1VAU  -8.91364804  1.7662603  2.2384946  0.81703320
1DAM   5.11561083 -0.3544404  0.9146535 -0.03635384
2BOU   2.23167507 -1.1765494 -0.2330929 -1.95693522
1BOI   3.24726041 -1.0334993  0.9464923 -2.02091592
3EL    0.61102462  1.4896856 -3.4745080  1.52051796
DOM1  -0.03860967 -1.0164668  0.7044161 -0.20252734
1TUR  -2.20591571 -0.1243452  1.3565436  1.70486202
4EL    1.52871082  0.9831932  1.1175582  0.90170617
PER1   2.24714045  1.2068170  0.8283587  0.80207738
2DAM   3.81409923 -1.2436689 -0.4411221  0.93890169
1POY   3.68689983 -0.8233545 -0.7957766 -0.29407315
1ING   2.18059161 -1.1324451 -0.8229520 -1.25638574
1BEN   0.56192298 -2.1682644  0.8975797 -1.62779072
2BEA   2.82110323 -1.2996458  2.5143776  1.43765655
1ROC  -0.96916159 -0.8097306  1.8215856  1.65655478
2ING -10.12748228 -1.9726921 -0.1491841 -1.80619822
T1    -0.09316728  6.4453926  0.8119644 -0.90597131
T2     0.06048716  7.0351424 -1.8060285 -0.84135403
                  dim 1      dim 2       dim 3       dim 4       dim 5
100m        -0.77471983  0.1871420 -0.18440714 -0.03781826  0.30219639
Long.jump    0.74189974 -0.3454213  0.18221105  0.10178564  0.03667805
Shot.put     0.62250255  0.5983033 -0.02337844  0.19059161  0.11115082
High.jump    0.57194530  0.3502936 -0.25951193 -0.13559420  0.55543957
400m        -0.67960994  0.5694378  0.13146970  0.02930198 -0.08769157
110m.hurdle -0.74624532  0.2287933 -0.09263738  0.29083103  0.16432095
Discus       0.55246652  0.6063134  0.04295225 -0.25967143 -0.10482712
Pole.vault   0.05034151 -0.1803569  0.69175665  0.55153397  0.32995932
Javeline     0.27711085  0.3169891 -0.38965541  0.71227728 -0.30512892
1500m       -0.05807706  0.4742238  0.78214280 -0.16108904 -0.15356189
          dim 1      dim 2      dim 3         dim 4      dim 5
V1=0 -0.5783569  0.4636381 -0.5525394  5.504873e-18  0.3829725
V1=1  0.4626855 -0.3709105  0.4420315 -5.513840e-17 -0.3063780
V2=0 -0.8955155 -0.4783046  0.1597782  1.445949e-16 -0.2207127
V2=1  0.7164124  0.3826436 -0.1278225 -3.624478e-17  0.1765701
V3=0  0.2400409 -0.5459218 -0.2931677 -9.968060e-17 -0.2872777
V3=1 -0.3772072  0.8578772  0.4606921  7.912212e-16  0.4514363
V4=1  1.3223938  0.3990959  1.8307268 -1.224745e+00  0.1659961
V4=2 -1.2169127 -0.6567002  0.8661456  6.123724e-01  0.3489608
V4=3  0.6368547 -1.2921318 -0.9872631 -6.123724e-01  1.1119938
V4=4 -0.4637872  0.6986760 -0.5578865 -6.123724e-01 -0.7414822
V4=5  1.0316741  0.5043161 -0.2723093  1.837117e+00 -0.2049745

Call:
PCAmix(X.quali = vnf, rename.level = TRUE)

Method = Multiple Correspondence Analysis (MCA)


     name     
[1,] "$eig"   
[2,] "$ind"   
[3,] "$quanti"
[4,] "$levels"
[5,] "$quali" 
[6,] "$sqload"
[7,] "$coef"  
     description                                                               
[1,] "eigenvalues of the principal components (PC) "                           
[2,] "results for the individuals (coord,contrib,cos2)"                        
[3,] "results for the quantitative variables (coord,contrib,cos2)"             
[4,] "results for the levels of the qualitative variables (coord,contrib,cos2)"
[5,] "results for the qualitative variables (contrib,relative contrib)"        
[6,] "squared loadings"                                                        
[7,] "coef of the linear combinations defining the PC"                         

Call:
PCAmix(X.quali = vnf2, rename.level = TRUE)

Method = Multiple Correspondence Analysis (MCA)


     name     
[1,] "$eig"   
[2,] "$ind"   
[3,] "$quanti"
[4,] "$levels"
[5,] "$quali" 
[6,] "$sqload"
[7,] "$coef"  
     description                                                               
[1,] "eigenvalues of the principal components (PC) "                           
[2,] "results for the individuals (coord,contrib,cos2)"                        
[3,] "results for the quantitative variables (coord,contrib,cos2)"             
[4,] "results for the levels of the qualitative variables (coord,contrib,cos2)"
[5,] "results for the qualitative variables (contrib,relative contrib)"        
[6,] "squared loadings"                                                        
[7,] "coef of the linear combinations defining the PC"                         

PCAmixdata documentation built on Nov. 17, 2017, 7:38 a.m.