# cov.rob: Resistant Estimation of Multivariate Location and Scatter In MASS: Support Functions and Datasets for Venables and Ripley's MASS

## Description

Compute a multivariate location and scale estimate with a high breakdown point – this can be thought of as estimating the mean and covariance of the `good` part of the data. `cov.mve` and `cov.mcd` are compatibility wrappers.

## Usage

 ```1 2 3 4 5 6``` ```cov.rob(x, cor = FALSE, quantile.used = floor((n + p + 1)/2), method = c("mve", "mcd", "classical"), nsamp = "best", seed) cov.mve(...) cov.mcd(...) ```

## Arguments

 `x` a matrix or data frame. `cor` should the returned result include a correlation matrix? `quantile.used` the minimum number of the data points regarded as `good` points. `method` the method to be used – minimum volume ellipsoid, minimum covariance determinant or classical product-moment. Using `cov.mve` or `cov.mcd` forces `mve` or `mcd` respectively. `nsamp` the number of samples or `"best"` or `"exact"` or `"sample"`. The limit If `"sample"` the number chosen is `min(5*p, 3000)`, taken from Rousseeuw and Hubert (1997). If `"best"` exhaustive enumeration is done up to 5000 samples: if `"exact"` exhaustive enumeration will be attempted. `seed` the seed to be used for random sampling: see `RNGkind`. The current value of `.Random.seed` will be preserved if it is set. `...` arguments to `cov.rob` other than `method`.

## Details

For method `"mve"`, an approximate search is made of a subset of size `quantile.used` with an enclosing ellipsoid of smallest volume; in method `"mcd"` it is the volume of the Gaussian confidence ellipsoid, equivalently the determinant of the classical covariance matrix, that is minimized. The mean of the subset provides a first estimate of the location, and the rescaled covariance matrix a first estimate of scatter. The Mahalanobis distances of all the points from the location estimate for this covariance matrix are calculated, and those points within the 97.5% point under Gaussian assumptions are declared to be `good`. The final estimates are the mean and rescaled covariance of the `good` points.

The rescaling is by the appropriate percentile under Gaussian data; in addition the first covariance matrix has an ad hoc finite-sample correction given by Marazzi.

For method `"mve"` the search is made over ellipsoids determined by the covariance matrix of `p` of the data points. For method `"mcd"` an additional improvement step suggested by Rousseeuw and van Driessen (1999) is used, in which once a subset of size `quantile.used` is selected, an ellipsoid based on its covariance is tested (as this will have no larger a determinant, and may be smaller).

There is a hard limit on the allowed number of samples, 2^31 - 1. However, practical limits are likely to be much lower and one might check the number of samples used for exhaustive enumeration, `combn(NROW(x), NCOL(x) + 1)`, before attempting it.

## Value

A list with components

 `center` the final estimate of location. `cov` the final estimate of scatter. `cor` (only is `cor = TRUE`) the estimate of the correlation matrix. `sing` message giving number of singular samples out of total `crit` the value of the criterion on log scale. For MCD this is the determinant, and for MVE it is proportional to the volume. `best` the subset used. For MVE the best sample, for MCD the best set of size `quantile.used`. `n.obs` total number of observations.

## References

P. J. Rousseeuw and A. M. Leroy (1987) Robust Regression and Outlier Detection. Wiley.

A. Marazzi (1993) Algorithms, Routines and S Functions for Robust Statistics. Wadsworth and Brooks/Cole.

P. J. Rousseeuw and B. C. van Zomeren (1990) Unmasking multivariate outliers and leverage points, Journal of the American Statistical Association, 85, 633–639.

P. J. Rousseeuw and K. van Driessen (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223.

P. Rousseeuw and M. Hubert (1997) Recent developments in PROGRESS. In L1-Statistical Procedures and Related Topics ed Y. Dodge, IMS Lecture Notes volume 31, pp. 201–214.

`lqs`

## Examples

 ```1 2 3``` ```set.seed(123) cov.rob(stackloss) cov.rob(stack.x, method = "mcd", nsamp = "exact") ```

### Example output

```\$center
Air.Flow Water.Temp Acid.Conc. stack.loss
56.3750    20.0000    85.4375    13.0625

\$cov
Air.Flow Water.Temp Acid.Conc. stack.loss
Air.Flow   23.050000   6.666667  16.625000  19.308333
Water.Temp  6.666667   5.733333   5.333333   7.733333
Acid.Conc. 16.625000   5.333333  34.395833  13.837500
stack.loss 19.308333   7.733333  13.837500  18.462500

\$msg
 "20 singular samples of size 5 out of 2500"

\$crit
 19.89056

\$best
  5  6  7  8  9 10 11 12 15 16 18 19 20

\$n.obs
 21

\$center
Air.Flow Water.Temp Acid.Conc.
56.70588   20.23529   85.52941

\$cov
Air.Flow Water.Temp Acid.Conc.
Air.Flow   23.470588   7.573529  16.102941
Water.Temp  7.573529   6.316176   5.367647
Acid.Conc. 16.102941   5.367647  32.389706

\$msg
 "266 singular samples of size 4 out of 5985"

\$crit
 5.472581

\$best
  4  5  6  7  8  9 10 11 12 13 14 20

\$n.obs
 21
```

MASS documentation built on May 3, 2021, 5:08 p.m.