# CauRuimet: Robust estimation of within group varinace-covariance In PTAk: Principal Tensor Analysis on k Modes

## Description

Gives a robust estimate of an unknown within group covariance, aiming either to look for dense groups or to sparse groups (outliers) according to local variance and weighting function choice.

## Usage

 1 2 3 4  CauRuimet(Z,ker=1,m0=1,withingroup=TRUE, loc=substitute(apply(Z,2,mean,trim=.1)),matrixmethod=TRUE, Nrandom=3000) 

## Arguments

 Z matrix ker either numerical or a function: if numerical the weighting function is e^{(-ker \;t)}, otherwise ker=function(t){return(expression)} is a positive decreasing function. m0 is a graph of neighbourhood or another proximity matrix, the hadamard product of the proximities will be operated withingroup logical,if TRUE the aim is to give a robust estimate for dense groups, if FALSE the aim is to give a robust estimate for outliers loc a vector of locations or a function using mean, median, to give an estimate of it matrixmethod if TRUE (only with withingroup) uses some matrix computation rather than double looping as suggests the formula below Nrandom if Nrandom < dim(Z)) uses only a Nrandom sample from rows of Z and m0 if applicable.

## Details

When withingroup is TRUE, local(defined by the weighting) variance formula is returned, aiming at finding dense groups:

W_l=\frac{∑_{i=1}^{n-1}∑_{j=i+1}^n m0_{ij}ker(d^2_{S^-}(Z_i,Z_j))(Z_i-Z_j)'(Z_i-Z_j)}{∑_{i=1}^{n-1}∑_{j=i+1}^n m0_{ij}ker(d^2_{S^-}(Z_i,Z_j))}

where d^2_{S^-}( . , .) is the squared euclidian distance with S^- the inverse of a robust sample covariance (i.e. using loc instead of the mean) ; if FALSE robust Total weighted variance or if m0 not 1 Global weighted variance, is returned:

W_o=\frac{∑_{i=1}^nker(d^2_{S^-}(Z_i,\tilde{Z}))(Z_i-\tilde{Z})'(Z_i-\tilde{Z})} {∑_{i=1}^n ker(d^2_{S^-}(Z_i,\tilde{Z}))}

W_g=\frac{∑_{i=1}^{n-1}∑_{j=i+1}^n m0_{ij}.ker(d^2_{S^-}(Z_i,Z_j))(Z_i-\tilde{Z})'(Z_j-\tilde{Z})} {∑_{i=1}^{n-1}∑_{j=i+1}^n m0_{ij}ker(d^2_{S^-}(Z_i,Z_j))}

where \tilde{Z} is the vector loc.
If m0 is a graph of neighbourhood and ker is the function returning 1 (no proximity due to distance is used) the function will return (when withingroup=TRUE) the local variance-covariance matrix as define in Lebart(1969).

a matrix

## Note

As mentioned by Caussinus and Ruiz a good strategy to reveal dense groups with generalised PCA would be to reveal outliers first using the metric W_o^{-1} and remove them before using the metric W_l^{-1}. Based on theoretical considerations they recommand for the choice of ker, with the decreasing function e^{(-ker \;t)}: a lower bound of 1 if withingroup and something fairly small say in the interval [0.05;0.3] otherwise.

## Author(s)

Didier G. Leibovici

## References

Caussinus, H and Ruiz, A (1990) Interesting Projections of Multidimensional Data by Means of Generalized Principal Components Analysis. COMPSTAT90, Physica-Verlag, Heidelberg,121-126.

Faraj, A (1994) Interpretation tools for Generalized Discriminant Analysis.In: New Approches in Classification and Data Analysis, Springer-Verlag, 286-291, Heidelberg.

Lebart, L (1969) Analyse statistique de la contiguit<e9>e.Publication de l'Institut de Statistiques Universitaire de Paris, XVIII,81-112.

Leibovici D (2008) Spatio-temporal Multiway Decomposition using Principal Tensor Analysis on k-modes: the R package PTAk . to be submitted soon at Journal of Statisticcal Software.

SVDgen
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  data(iris) iris2 <- as.matrix(iris[,1:4]) dimnames(iris2)[] <- as.character(iris[,5]) D2 <- CauRuimet(iris2,ker=1,withingroup=TRUE) D2 <- Powmat(D2,(-1)) iris2 <- sweep(iris2,2,apply(iris2,2,mean)) res <- SVDgen(iris2,D2=D2,D1=1) plot(res,nb1=1,nb2=2,cex=0.5) summary(res,testvar=0) # the same in a demo function # source(paste(R.home(),"/library/PTAk/demo/CauRuimet.R",sep="")) # demo.CauRuimet(ker=4,withingroup=TRUE,openX11s=FALSE) # demo.Cauruimet(ker=0.15,withingroup=FALSE,openX11s=FALSE)