covMcd  R Documentation 
Compute the Minimum Covariance Determinant (MCD) estimator, a robust multivariate location and scale estimate with a high breakdown point, via the ‘Fast MCD’ or ‘Deterministic MCD’ (“DetMcd”) algorithm.
covMcd(x, cor = FALSE, raw.only = FALSE, alpha =, nsamp =, nmini =, kmini =, scalefn =, maxcsteps =, initHsets = NULL, save.hsets = FALSE, names = TRUE, seed =, tolSolve =, trace =, use.correction =, wgtFUN =, control = rrcov.control())
x 
a matrix or data frame. 
cor 
should the returned result include a correlation matrix?
Default is 
raw.only 
should only the “raw” estimate be returned, i.e., no (re)weighting step be performed; default is false. 
alpha 
numeric parameter controlling the size of the subsets
over which the determinant is minimized; roughly 
nsamp 
number of subsets used for initial estimates or For 
nmini, kmini 
for n >= 2 n_0,
n_0 := \code{nmini}, the algorithm splits the data into
maximally 
scalefn 
for the deterministic MCD: 
maxcsteps 
maximal number of concentration steps in the deterministic MCD; should not be reached. 
initHsets 
NULL or a K x h integer matrix of initial
subsets of observations of size h (specified by the indices in

save.hsets 
(for deterministic MCD) logical indicating if the
initial subsets should be returned as 
names 
logical; if true (as by default), several parts of the
result have a 
seed 
initial seed for random generator, like

tolSolve 
numeric tolerance to be used for inversion
( 
trace 
logical (or integer) indicating if intermediate results
should be printed; defaults to 
use.correction 
whether to use finite sample correction
factors; defaults to 
wgtFUN 
a character string or 
control 
a list with estimation options  this includes those
above provided in the function specification, see

The minimum covariance determinant estimator of location and scatter
implemented in covMcd()
is similar to R function
cov.mcd()
in MASS. The MCD method looks for
the h (> n/2) (h = h(α,n,p) =
h.alpha.n(alpha,n,p)
) observations (out of n)
whose classical covariance matrix has the lowest possible determinant.
The raw MCD estimate of location is then the average of these h points,
whereas the raw MCD estimate of scatter is their covariance matrix,
multiplied by a consistency factor (.MCDcons(p, h/n)
) and (if
use.correction
is true) a finite sample correction factor
(.MCDcnp2(p, n, alpha)
), to make it consistent at the
normal model and unbiased at small samples. Both rescaling factors
(consistency and finite sample) are returned in the length2 vector
raw.cnp2
.
The implementation of covMcd
uses the Fast MCD algorithm of
Rousseeuw and Van Driessen (1999) to approximate the minimum
covariance determinant estimator.
Based on these raw MCD estimates, (unless argument raw.only
is
true), a reweighting step is performed, i.e., V < cov.wt(x,w)
,
where w
are weights determined by “outlyingness” with
respect to the scaled raw MCD. Again, a consistency factor and
(if use.correction
is true) a finite sample correction factor
(.MCDcnp2.rew(p, n, alpha)
) are applied.
The reweighted covariance is typically considerably more efficient
than the raw one, see Pison et al. (2002).
The two rescaling factors for the reweighted estimates are returned in
cnp2
. Details for the computation of the finite sample
correction factors can be found in Pison et al. (2002).
An object of class "mcd"
which is basically a
list
with components
center 
the final estimate of location. 
cov 
the final estimate of scatter. 
cor 
the (final) estimate of the correlation matrix (only if

crit 
the value of the criterion, i.e., the logarithm of the determinant. Previous to Nov.2014, it contained the determinant itself which can under or overflow relatively easily. 
best 
the best subset found and used for computing the raw
estimates, with 
mah 
mahalanobis distances of the observations using the final estimate of the location and scatter. 
mcd.wt 
weights of the observations using the final estimate of the location and scatter. 
cnp2 
a vector of length two containing the consistency correction factor and the finite sample correction factor of the final estimate of the covariance matrix. 
raw.center 
the raw (not reweighted) estimate of location. 
raw.cov 
the raw (not reweighted) estimate of scatter. 
raw.mah 
mahalanobis distances of the observations based on the raw estimate of the location and scatter. 
raw.weights 
weights of the observations based on the raw estimate of the location and scatter. 
raw.cnp2 
a vector of length two containing the consistency correction factor and the finite sample correction factor of the raw estimate of the covariance matrix. 
X 
the input data as numeric matrix, without 
n.obs 
total number of observations. 
alpha 
the size of the subsets over which the determinant is minimized (the default is (n+p+1)/2). 
quan 
the number of observations, h, on which the MCD is
based. If 
method 
character string naming the method (Minimum Covariance
Determinant), starting with 
iBest 
(for the deterministic MCD) contains indices from 1:6 denoting which of the (six) initial subsets lead to the best set found. 
n.csteps 
(for the deterministic MCD) for each of the initial subsets, the number of Csteps executed till convergence. 
call 
the call used (see 
Valentin Todorov valentin.todorov@chello.at, based on work written for Splus by Peter Rousseeuw and Katrien van Driessen from University of Antwerp.
Visibility of (formerly internal) tuning parameters, notably
wgtFUN()
: Martin Maechler
Rousseeuw, P. J. and Leroy, A. M. (1987) Robust Regression and Outlier Detection. Wiley.
Rousseeuw, P. J. and van Driessen, K. (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223.
Pison, G., Van Aelst, S., and Willems, G. (2002) Small Sample Corrections for LTS and MCD, Metrika 55, 111–123.
Hubert, M., Rousseeuw, P. J. and Verdonck, T. (2012) A deterministic algorithm for robust location and scatter. Journal of Computational and Graphical Statistics 21, 618–637.
cov.mcd
from package MASS;
covOGK
as cheaper alternative for larger dimensions.
BACON
and covNNC
,
from package robustX;
data(hbk) hbk.x < data.matrix(hbk[, 1:3]) set.seed(17) (cH < covMcd(hbk.x)) cH0 < covMcd(hbk.x, nsamp = "deterministic") with(cH0, stopifnot(quan == 39, iBest == c(1:4,6), # 5 out of 6 gave the same identical(raw.weights, mcd.wt), identical(which(mcd.wt == 0), 1:14), all.equal(crit, 1.045500594135))) ## the following three statements are equivalent c1 < covMcd(hbk.x, alpha = 0.75) c2 < covMcd(hbk.x, control = rrcov.control(alpha = 0.75)) ## direct specification overrides control one: c3 < covMcd(hbk.x, alpha = 0.75, control = rrcov.control(alpha=0.95)) c1 ## Martin's smooth reweighting: ## List of experimental prespecified wgtFUN() creators: ## Cutoffs may depend on (n, p, control$beta) : str(.wgtFUN.covMcd) cMM < covMcd(hbk.x, wgtFUN = "sm1.adaptive") ina < which(names(cH) == "call") all.equal(cMM[ina], cH[ina]) # *some* differences, not huge (same 'best'): stopifnot(all.equal(cMM[ina], cH[ina], tol = 0.2))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.