# Function to undertake a Robust Exploratory Multivariate Data Analysis

### Description

The function carries out a robust Principal Components Analysis (PCA) and estimates the Mahalanobis distances for a non-compositional dataset and places them in an object to be saved and post-processed for display and further manipulation. For closed compositional, geochemical, data use function `gx.robmva.closed`

. Robust procedures are used, ‘MCD’, ‘MVE’ or user supplied weights, for classical procedures see `gx.mva`

. For results display see `gx.rqpca.screeplot`

, `gx.rqpca.loadplot`

, `gx.rqpca.plot`

, `gx.rqpca.print`

, `gx.md.plot`

and `gx.md.print`

. For Kaiser varimax rotation see `gx.rotate`

.

### Usage

1 | ```
gx.robmva(xx, proc = "mcd", wts = NULL, main = deparse(substitute(xx)))
``` |

### Arguments

`xx` |
a |

`proc` |
by default |

`wts` |
by default |

`main` |
by default the name of the object |

### Details

If `main`

is undefined the name of the matrix object passed to the function is used to identify the object. This is the recommended procedure as it helps to track the progression of a data analysis. Alternate plot titles are best defined when the saved object is passed to `gx.rqpca.screeplot`

, `gx.rqpca.loadplot`

, `gx.rqpca.plot`

or `gx.md.plot`

for display. If no plot title is required set `main = " "`

, or if a user defined plot title is required it may be defined, e.g., `main = "Plot Title Text"`

.

The variances of the robust Principal Component scores are displayed, in a non-robust PCA these decrease with increasing component rank. However, in a robust PCA this may not be the case, and lower-order scores with high variances are often worthy of further inspection.

### Value

The following are returned as an object to be saved for subsequent display, etc.:

`main` |
by default (recommended) the input data matrix name. |

`input` |
the data matrix name, |

`proc` |
the robust procedure used, the value of |

`n` |
the total number of individuals (observations, cases or samples) in the input data matrix. |

`nc` |
the number of individuals remaining in the ‘core’ data subset following the robust estimation, i.e. the sum of those individuals with |

`p` |
the number of variables on which the multivariate operations were based. |

`ifilr` |
flag for |

`matnames` |
the row numbers or identifiers and column headings of the input matrix. |

`wts` |
the vector of weights for the |

`mean` |
the length |

`cov` |
the |

`sd` |
the length |

`snd` |
the |

`r` |
the |

`eigenvalues` |
the vector of |

`econtrib` |
the vector of |

`eigenvectors` |
the |

`rload` |
the |

`rcr` |
the |

`rqscore` |
the |

`vcontrib` |
a vector of |

`pvcontrib` |
the vector of |

`cpvcontrib` |
the vector of |

`md` |
the vector of |

`ppm` |
the vector of |

`epm` |
the vector of |

`nr` |
the number of PCs that have been rotated. At this stage of a data analysis |

### Note

Any less than detection limit values represented by negative values, or zeros or other numeric codes representing blanks in the data, must be removed prior to executing this function, see `ltdl.fix.df`

.

Any rows in the data matrix with `NA`

s are removed prior to computions. In the instance of a compositional data opening transformation `NA`

s have to be removed prior to undertaking the transformation, see `na.omit`

, `where.na`

and `remove.na`

. When that procedure is followed the opening transformations may be executed on calling the function, see Examples below.

Passing a set of weights from an investigation with `gx.md.gait`

or on the basis of some prior knowledge permits the use of a `clr`

transformation. In this instance a Moore-Penrose inverse is computed and used for the estimation of Mahalanobis distances. See example below. With refrerence to weights based on prior knowledge, the weights are not necessarily constrained to be ‘0’ or ‘1’, intermediate values may be employed.

Executing a `clr`

transformation leads to both collinearity and singularity such that neither a PCA can be undertaken or Mahalanobis distances be estimated. The function fails - do not use with a `clr`

transformation.

Executing a `ilr`

transformation permits the estimation of both Principal Components and Mahalanobis distances and associated probabilities through the use of `(p-1)`

synthetic variables. However, in that instance the loadings of the `(p-1)`

synthetic variables will be plotted by `gx.rqpca.plot`

rather than the loadings for the elements.

Warnings are generated when the number of individuals (observations, cases or samples) falls below 5*`p`

, and additional warnings when the number of individuals falls below 3*`p`

. At these low ratios of individuals to variables the shape of the `p`

-space hyperellipsoid is difficult to reliably define, and therefor the results may lack stability. These limits 5*`p`

and 3*`p`

are generous, the latter especially so; many statisticians would argue that the number of individuals should not fall below 9*`p`

, see Garrett (1993).

### Author(s)

Robert G. Garrett

### References

Garrett, R.G., 1990. A robust multivariate allocation procedure with applications to geochemical data. In Proc. Colloquium on Statistical Applications in the Earth Sciences (Eds F.P. Agterberg & G.F. Bonham-Carter). Geological Survey of Canada Paper 89-9, pp. 309-318.

Garrett, R.G., 1993. Another cry from the heart. Explore - Assoc. Exploration Geochemists Newsletter, 81:9-14.

Grunsky, E.C., 2001. A program for computing RQ-mode principal components analysis for S-Plus and R. Computers & Geosciences, 27(2):229-235.

Reimann, C., Filzmoser, P., Garrett, R. and Dutter, R., 2008. Statistical Data Analysis Explained: Applied Environmental Statistics with R. John Wiley & Sons, Ltd., 362 p.

### See Also

`ltdl.fix.df`

, `remove.na`

, `na.omit`

, `gx.rqpca.screeplot`

, `gx.rqpca.loadplot`

, `gx.rqpca.plot`

, `gx.rqpca.print`

, `gx.md.plot`

, `gx.md.print`

, `gx.robmva.closed`

, `gx.rotate`

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | ```
## Generate a population of synthetic bivariate normal data
grp1 <- mvrnorm(100, mu = c(40, 30), Sigma = matrix(c(6, 3, 3, 2), 2, 2))
grp1 <- cbind(grp1, rep(1, 100))
## Generate a set of six (6) outliers
anom <- matrix(c(43, 34, 50, 37, 47, 30, 27, 29, 35, 33, 32, 25),6, 2)
anom <- cbind(anom, rep(2, 6))
## Bind the test data together and display the test data
test.robmva.mat <- rbind(grp1, anom)
test.robmva <- as.data.frame(test.robmva.mat)
dimnames(test.robmva)[[2]] <- c("x","y","grp")
attach(test.robmva)
xyplot.tags(x, y, dimnames(test.robmva)[[1]], cex = 0.75)
## Generate gx.robmva saved object
save.rob <- gx.robmva(as.matrix(test.robmva[, c(1:2)]))
## Display saved object with alternate main titles
gx.rqpca.screeplot(save.rob, main = "Bivariate synthetic data")
gx.rqpca.plot(save.rob, cex.lab = 0.8, rowids = TRUE,
main = "Bivariate synthetic data")
gx.md.plot(save.rob, cex.main = 0.9, cex.lab = 0.8, cex.axis = 0.8,
main = "Bivariate synthetic data")
gx.md.print(save.rob, pcut = 0.05)
## Clean-up and detach test data
rm(grp1)
rm(anom)
rm(test.robmva.mat)
rm(test.robmva)
rm(save.rob)
detach(test.robmva)
``` |