# cov.rob: Resistant Estimation of Multivariate Location and Scatter In MASS: Support Functions and Datasets for Venables and Ripley's MASS

 cov.rob R Documentation

## Resistant Estimation of Multivariate Location and Scatter

### Description

Compute a multivariate location and scale estimate with a high breakdown point – this can be thought of as estimating the mean and covariance of the `good` part of the data. `cov.mve` and `cov.mcd` are compatibility wrappers.

### Usage

```cov.rob(x, cor = FALSE, quantile.used = floor((n + p + 1)/2),
method = c("mve", "mcd", "classical"),
nsamp = "best", seed)

cov.mve(...)
cov.mcd(...)
```

### Arguments

 `x` a matrix or data frame. `cor` should the returned result include a correlation matrix? `quantile.used` the minimum number of the data points regarded as `good` points. `method` the method to be used – minimum volume ellipsoid, minimum covariance determinant or classical product-moment. Using `cov.mve` or `cov.mcd` forces `mve` or `mcd` respectively. `nsamp` the number of samples or `"best"` or `"exact"` or `"sample"`. The limit If `"sample"` the number chosen is `min(5*p, 3000)`, taken from Rousseeuw and Hubert (1997). If `"best"` exhaustive enumeration is done up to 5000 samples: if `"exact"` exhaustive enumeration will be attempted. `seed` the seed to be used for random sampling: see `RNGkind`. The current value of `.Random.seed` will be preserved if it is set. `...` arguments to `cov.rob` other than `method`.

### Details

For method `"mve"`, an approximate search is made of a subset of size `quantile.used` with an enclosing ellipsoid of smallest volume; in method `"mcd"` it is the volume of the Gaussian confidence ellipsoid, equivalently the determinant of the classical covariance matrix, that is minimized. The mean of the subset provides a first estimate of the location, and the rescaled covariance matrix a first estimate of scatter. The Mahalanobis distances of all the points from the location estimate for this covariance matrix are calculated, and those points within the 97.5% point under Gaussian assumptions are declared to be `good`. The final estimates are the mean and rescaled covariance of the `good` points.

The rescaling is by the appropriate percentile under Gaussian data; in addition the first covariance matrix has an ad hoc finite-sample correction given by Marazzi.

For method `"mve"` the search is made over ellipsoids determined by the covariance matrix of `p` of the data points. For method `"mcd"` an additional improvement step suggested by Rousseeuw and van Driessen (1999) is used, in which once a subset of size `quantile.used` is selected, an ellipsoid based on its covariance is tested (as this will have no larger a determinant, and may be smaller).

There is a hard limit on the allowed number of samples, 2^31 - 1. However, practical limits are likely to be much lower and one might check the number of samples used for exhaustive enumeration, `combn(NROW(x), NCOL(x) + 1)`, before attempting it.

### Value

A list with components

 `center` the final estimate of location. `cov` the final estimate of scatter. `cor` (only is `cor = TRUE`) the estimate of the correlation matrix. `sing` message giving number of singular samples out of total `crit` the value of the criterion on log scale. For MCD this is the determinant, and for MVE it is proportional to the volume. `best` the subset used. For MVE the best sample, for MCD the best set of size `quantile.used`. `n.obs` total number of observations.

### References

P. J. Rousseeuw and A. M. Leroy (1987) Robust Regression and Outlier Detection. Wiley.

A. Marazzi (1993) Algorithms, Routines and S Functions for Robust Statistics. Wadsworth and Brooks/Cole.

P. J. Rousseeuw and B. C. van Zomeren (1990) Unmasking multivariate outliers and leverage points, Journal of the American Statistical Association, 85, 633–639.

P. J. Rousseeuw and K. van Driessen (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223.

P. Rousseeuw and M. Hubert (1997) Recent developments in PROGRESS. In L1-Statistical Procedures and Related Topics ed Y. Dodge, IMS Lecture Notes volume 31, pp. 201–214.

`lqs`
```set.seed(123)