covRob: Robust Covariance/Correlation Matrix Estimation

View source: R/covRob.q

covRobR Documentation

Robust Covariance/Correlation Matrix Estimation

Description

Compute robust estimates of multivariate location and scatter.

Usage

covRob(data, corr = FALSE, distance = TRUE, na.action = na.fail,
       estim = "auto", control = covRob.control(estim, ...), ...)

Arguments

data

a numeric matrix or data frame containing the data.

corr

a logical flag. If corr = TRUE then the estimated correlation matrix is computed.

distance

a logical flag. If distance = TRUE the squared Mahalanobis distances are computed.

na.action

a function to filter missing data. The default na.fail produces an error if missing values are present. An alternative is na.omit which deletes observations that contain one or more missing values.

estim

a character string specifying the robust estimator to be used. The choices are: "mcd" for the Fast MCD algorithm of Rousseeuw and Van Driessen, "weighted" for the Reweighted MCD, "donostah" for the Donoho-Stahel projection based estimator, "M" for the constrained M estimator provided by Rocke, "pairwiseQC" for the orthogonalized quadrant correlation pairwise estimator, and "pairwiseGK" for the Orthogonalized Gnanadesikan-Kettenring pairwise estimator. The default "auto" selects from "donostah", "mcd", and "pairwiseQC" with the goal of producing a good estimate in a reasonable amount of time.

control

a list of control parameters to be used in the numerical algorithms. See covRob.control for the possible control parameters and their default settings. This argument is ignored when estim = "auto".

...

control parameters may be passed directly when estim != "auto".

Details

The covRob function selects a robust covariance estimator that is likely to provide a good estimate in a reasonable amount of time. Presently this selection is based on the problem size. The Donoho-Stahel estimator is used if there are less than 1000 observations and less than 10 variables or less than 5000 observations and less than 5 variables. If there are less than 50000 observations and less than 20 variables then the MCD is used. For larger problems, the Orthogonalized Quadrant Correlation estimator is used.

The MCD and Reweighted-MCD estimates (estim = "mcd" and estim = "weighted" respectively) are computed using the covMcd function in the robustbase package. By default, covMcd returns the reweighted estimate; the actual MCD estimate is contained in the components of the output list prefixed with raw.

The M estimate (estim = "M") is computed using the CovMest function in the rrcov package. For historical reasons the Robust Library uses the MCD to compute the initial estimate.

The Donoho-Stahel (estim = "donostah") estimator is computed using the CovSde function provided in the rrcov package.

The pairwise estimators (estim = "pairwisegk" and estim = "pairwiseqc") are computed using the CovOgk function in the rrcov package.

Value

an object of class "covRob" with components:

call

an image of the call that produced the object with all the arguments named.

cov

a numeric matrix containing the final robust estimate of the covariance/correlation matrix.

center

a numeric vector containing the final robust estimate of the location vector.

dist

a numeric vector containing the squared Mahalanobis distances computed using robust estimates of covariance and location contained in cov and center. If distance = FALSE this element will me missing.

raw.cov

a numeric matrix containing the initial robust estimate of the covariance/correlation matrix. If there is no initial robust estimate then this element is set to NA.

raw.center

a numeric vector containing the initial robust estimate of the location vector. If there is no initial robust estimate then this element is set to NA.

raw.dist

a numeric vector containing the squared Mahalanobis distances computed using the initial robust estimates of covariance and location contained in raw.cov and raw.center. If distance = FALSE or if there is no initial robust estimate then this element is set to NA.

corr

a logical flag. If corr = TRUE then cov and raw.cov contain robust estimates of the correlation matrix of data.

estim

a character string containing the name of the robust estimator.

control

a list containing the control parameters used by the robust estimator.

Note

Version 0.3-8 of the Robust Library: all of the functions origianlly contributed by the S-Plus Robust Library have been replaced by dependencies on the robustbase and rrcov packages. Computed results may differ from earlier versions of the Robust Library. In particular, the MCD estimators are now adjusted by a small sample size correction factor. Additionally, a bug was fixed where the final MCD covariance estimate produced with estim = "mcd" was not rescaled for consistency.

References

R. A. Maronna and V. J. Yohai (1995) The Behavior of the Stahel-Donoho Robust Multivariate Estimator. Journal of the American Statistical Association 90 (429), 330–341.

P. J. Rousseeuw and K. van Driessen (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223.

D. L. Woodruff and D. M. Rocke (1994) Computable robust estimation of multivariate location and shape on high dimension using compound estimators. Journal of the American Statistical Association, 89, 888–896.

R. A. Maronna and R. H. Zamar (2002) Robust estimates of location and dispersion of high-dimensional datasets. Technometrics 44 (4), 307–317.

See Also

CovSde, covMcd, CovOgk, CovMest, covRob.control, covClassic.

Examples

  data(stackloss)
  covRob(stackloss)

robust documentation built on Sept. 11, 2024, 5:16 p.m.