HistPCA: HistPCA

Description Usage Arguments Details Value Author(s) References Examples

Description

Performs a PCA of multiple tables of histogram variables.

Usage

1
2
3
4
HistPCA(Variable = list, score = NULL, t = 1.1, axes = c(1, 2), 
        Row.names = NULL, xlim = NULL, ylim = NULL, xlegend = NULL, ylegend = NULL,
        Col.names = NULL, transformation = 1, method = "hypercube", proc = 0,
        plot3d.table = NULL, axes2 = c(1, 2, 3), ggplot = 1)

Arguments

Variable

List of all data frames containing initial histogram variable. Every histogram is a data frames and every columns of data frame contains histogram bins.

score

List of bins score of every histogram variable. By default these scores are the ranks of histogram bins.

t

t is a real number used for transforming histogram to interval via Tchebytchev's inequality. By default, t=1.1.

axes

a length 2 vector specifying the components to plot

Row.names

Retrieve or set the row names of a matrix-like object.

xlim

range for the plotted "x" values, defaulting to the range of the finite values of "x".

ylim

range for the plotted "y" values, defaulting to the range of the finite values of "y".

xlegend

This function could be used to add legends to plots.

ylegend

This function could be used to add legends to plots.

Col.names

Retrieve or set the row names of a matrix-like object.

transformation

type of tranformation for data. If transformation=2, angular is used.

method

method used (method='hypercube',method='longueur')

proc

option valid when method='longueur'. If proc=1, the procuste analysis is used.

plot3d.table

specification for the scatterplot3d. if plot3d.table=1, the scatterplot3d will appear.

axes2

a length 2 vector specifying the components to plot

ggplot

Details

See Examples

Value

Correlation

Correlations between means of histogram and their principal components

Tableaumean

Table containing the average of histogram mean

VecteurPropre

eigen vector of PCA of histogram mean

PourCentageComposante

a matrix containing all the eigenvalues, the percentage of variance and the cumulative percentage of variance

PCinterval

Data frame containing the coordinates of the individuals on the principal axes

Author(s)

Brahim Brahim <brahim.brahim@bigdatavisualizations.com> and Sun Makosso-Kallyth <makosso.sun@gmail.com>

References

Billard, L. and E. Diday (2006). Symbolic Data Analysis: conceptual statistics and data Mining. Berlin: Wiley series in computational statistics.

Diday, E., Rodriguez O. and Winberg S. (2000). Generalization of the Principal Components Analysis to Histogram Data, 4th European Conference on Principles and Practice of Knowledge Discovery in Data Bases, September 12-16, 2000, Lyon, France.

Donoho, D., Ramos, E. (1982). Primdata: Data Sets for Use With PRIM-H. Version for second (15-18, Aug, 1983) Exposition of Statistical Graphics Technology, by American Statistical Association.

Le-Rademacher J., Billard L. (2013). Principal component histograms from interval-valued observations, Computational Statistics, v.28 n.5, p.2117-2138.

Makosso-Kallyth S. and Diday E. (2012). Adaptation of interval PCA to symbolic histogram variables, Advances in Data Analysis and Classification July, Volume 6, Issue 2, pp 147-159.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
data(movies)
ab = movies
ab = na.omit(ab)
Action = subset(ab,Action==1)
Action$genre = as.factor("Action")
Drama = subset(ab,Drama==1)
Drama$genre = as.factor("Drama")

Animation = subset(ab,Animation==1)
Animation$genre = as.factor("Animation")

Comedy = subset(ab,Comedy==1)
Comedy$genre = as.factor("Comedy")

Documentary = subset(ab,Documentary ==1)
Documentary $genre = as.factor("Documentary")


Romance = subset(ab,Romance==1)
Romance$genre = as.factor("Romance")

Short = subset(ab,Short==1)
Short$genre = as.factor("Short")

 ab = rbind(Action,Drama,Animation,Comedy,Documentary,Romance,Short)
 Hist1=PrepHistogram(X=sapply(ab[,3],unlist),Z=ab[,25],k=5)$Vhistogram
Hist2=PrepHistogram(X=sapply(ab[,4],unlist),Z=ab[,25],k=5)$Vhistogram
 Hist3=PrepHistogram(X=sapply(ab[,5],unlist),Z=ab[,25],k=5)$Vhistogram
Hist4=PrepHistogram(X=sapply(ab[,6],unlist),Z=ab[,25],k=5)$Vhistogram
 Hist5=PrepHistogram(X=sapply(ab[,7],unlist),Z=ab[,25],k=5)$Vhistogram
 
 ss1=Ridi(Hist1)$Ridit
 ss2=Ridi(Hist2)$Ridit
 ss3=Ridi(Hist3)$Ridit
 ss4=Ridi(Hist4)$Ridit
 ss5=Ridi(Hist5)$Ridit

 
HistPCA(list(Hist1,Hist2,Hist3,Hist4,Hist5),score=list(ss1,ss2,ss3,ss4,ss5))

res_pca=HistPCA(list(Hist1,Hist2,Hist3,Hist4,Hist5),score=list(ss1,ss2,ss3,ss4,ss5))
 
 Visu(res_pca$PCinterval)

Example output

dev.new(): using pdf(file="Rplots1.pdf")
$Correlation
           Composante 1 Composante 2 Composante 3 Composante 4 Composante 5
Variable 1   -0.5099416   -0.8404558    0.1744720   0.05321523  -0.01792261
Variable 2   -0.8964693    0.3773781   -0.1639273   0.15665688  -0.05015003
Variable 3    0.7038555   -0.1734192   -0.2734637  -0.35609178  -0.52242657
Variable 4   -0.9289358    0.2128625    0.2691358  -0.13842813   0.01309501
Variable 5    0.5361541    0.3119180    0.7760655   0.10674359  -0.03967453

$VecteurPropre
     VecteurPropre 1 VecteurPropre 2 VecteurPropre 3 VecteurPropre 4
[1,]      -0.3517065     -0.86672330       0.2518212       0.2166830
[2,]      -0.5863505      0.36906619      -0.2243777       0.6049232
[3,]       0.1251775     -0.04611537      -0.1017769      -0.3738813
[4,]      -0.6625360      0.22700148       0.4017000      -0.5828772
[5,]       0.2790562      0.24274385       0.8452924       0.3279992
     VecteurPropre 5
[1,]     -0.12135838
[2,]     -0.32203402
[3,]     -0.91217226
[4,]      0.09169345
[5,]     -0.20273212

$Tableaumean
           [,1]       [,2]       [,3]       [,4]       [,5]
[1,]  0.8589656  4.8334938 -1.2980342  6.4569054  0.8820262
[2,]  3.9674694 -0.9520108  0.1997447  1.2762980 -2.2291313
[3,] -2.3470024  3.1952925  0.2271465  0.3074271 -3.5771625
[4,] -1.7619329 -1.1216316 -0.7631946 -1.6726714  0.8781062
[5,] -1.7185055 -2.7609073  0.7498192 -2.1782000  2.6901585
[6,]  4.2544378 -0.4661359 -0.0372088 -1.4127500 -0.8770593
[7,] -3.2534320 -2.7281007  0.9217272 -2.7770091  2.2330622

$PourCentageComposante
       eigenvalue percentage of variance cumulative percentage of variance
comp 1 17.0370073             58.3193212                          58.31932
comp 2  7.6204727             26.0856140                          84.40494
comp 3  3.8902740             13.3167835                          97.72172
comp 4  0.4888048              1.6732262                          99.39494
comp 5  0.1767567              0.6050552                         100.00000

$PCinterval
               PCMin.1    PCMax.1     PCMin.2   PCMax.2     PCMin.3    PCMax.3
Action      -10.113984 -4.5470300  1.39507050  4.163102  1.03254495  4.1738482
Drama        -2.845231 -1.7144001 -5.14551528 -2.955786 -0.74352695  0.3851163
Animation    -3.080693 -1.3624714  1.68626325  3.122643 -5.15027823 -3.3124228
Comedy        2.293623  2.7765038  0.80464185  1.158964 -0.26174764  0.1737415
Documentary   3.789631  5.2323098  0.01871686  1.170279  0.57833480  2.4404718
Romance      -1.219747  0.1469407 -5.60454015 -3.178133 -0.78202294  0.5237466
Short         4.658957  5.9855912  1.01321658  2.351077 -0.03976013  0.9819545
               PCMin.4    PCMax.4    PCMin.5   PCMax.5
Action      -2.5006393  2.7427340 -1.0334247 0.9063838
Drama       -1.7200808 -0.8118580 -0.1290639 0.5527385
Animation   -0.8653339  0.8391873 -0.7037315 0.3077986
Comedy       0.2438441  0.7322409  0.7836489 1.0959520
Documentary -0.9371000  0.5953836 -0.8374283 0.1746056
Romance      0.6293318  1.7498372 -0.6727035 0.1047213
Short       -1.1149256  0.4173790 -1.1190396 0.5695426

dev.new(): using pdf(file="Rplots2.pdf")
dev.new(): using pdf(file="Rplots3.pdf")

GraphPCA documentation built on May 2, 2019, 1:08 p.m.