cov.mrcd: Minimum Regularized Covariance Determinant Estimator
In abnormally-distributed/cvreg: Cross Validation and Robust Estimation Utilities

Description Usage Arguments Value References

This implements the minimum regularized covariance determinant estimator. This is similiar to the implementation in the rrcov package, but has a few tweaks to increase the speed of the algorithm and improve estimation, as well as being simpler to use. The minimum covariance determinant estimator is a robust estimator of the covariance which finds the 1-delta percentage of the data that yield the smallest determinant for the covariance matrix. It was introduced initially as part of the paper on least trimmed squares, and further developed into a workable algorithm within a few years(Rousseeuw, 1984; Rousseeuw & Van Driessen, 1999). Since a gigantic number of combinations of data points could be chosen, some heuristics are applied to seek good candidates and reduce computation time, which also ensures consistency through the determinstic selection of subsets (Hubert et al., 2012; 2018). The regularized version implemented here from Boudt et al (2019) further regularizes the calculated robust covariance matrix in a manner similar to that seen in the covShrink function. The method of calculating the regularization parameter in Boudt et al (2019) has been replaced by the faster method utilized by Schaefer & Strimmer (2005), which utilizes an analytic formula for calculating the optimal parameter value.

The MRCD method requires choosing a scale estimator. Several options are offered here:

- "tau" is the tau-scale defined in Yohai and Zamar (1998).
- "pb" is the percentage bend estimator (Shoemaker & Hettmansperger, 1982).
- "bisq" is Tukey's bisquare estimator.
- "huber" is Huber's psi estimator (Huber, 1964).
- "mopt" is the modified optimal estimator.
- "mad" is the (adjusted) median absolute deviation.
- "Qn" and "Sn" are two alternatives to the median based measures of location and scale (Rousseeuw, Peter, & Croux, 1993).

cov.mrcd(
  x,
  kappa = 0.8,
  alpha = NULL,
  method = c("tau", "pb", "bisq", "huber", "mopt", "Qn", "Sn"),
  opts = list(b = 0.1, eff = 0.9)
)

`x`	a data frame or matrix of numeric covariates
`kappa`	the the proportion of the data to use in each subset. defaults to 0.80. must be > 0.50.
`alpha`	a custom value for the regularization parameter. suggested to leave as NULL unless numerical problems are encountered.
`method`	a scale estimator. the tau-scale estimator is the default.
`opts`	list of options for the various scale estimators. "b" determines the percentage bend coefficient for "pb", "eff" determines the efficiency of the "huber", "bisquare" and "mopt" scale estimators.

a covRobust object containing the following elements:

center: multivariate mean of cleaned data set after applying casewise weights.
cov: covariance matrix of cleaned data set after applying casewise weights.
dist: the mahalanobis distances used in calculating the weights.
outliers: the indices of the outliers identified.
weights: the weights for downweighting outliers.

Boudt, K.; Rousseeuw, P.J.; Vanduffel, S. (2019) The minimum regularized covariance determinant estimator. Stat Comput . doi: 10.1007/s11222-019-09869-x

Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Statist. 35, 73–101.

Hubert, M.; Rousseeuw, P.J.; Verdonck, T. (2012) A Deterministic Algorithm for Robust Location and Scatter, Journal of Computational and Graphical Statistics, 21:3, 618-637, DOI: 10.1080/10618600.2012.672100

Hubert, M.; Debruyne, M.; Rousseeuw, P.J. (2018) Minimum covariance determinant and extensions. WIREs Comput Stat. 10. doi: 10.1002/wics.1421

Rousseeuw, P.J. (1984) Least median of squares regression. J Am Stat Assoc 79:871–880.

Rousseeuw, P.J.; Van Driessen, K. (1999) A fast algorithm for the Minimum Covariance Determinant estimator. Technometrics, 41:212–223.

Schaefer, J. ; K. Strimmer (2005) A shrinkage approach to large-scale covariance estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.