cellMCD: cellWise minimum covariance determinant estimator
In cellWise: Analyzing Data with Cellwise Outliers

cellMCD

R Documentation

cellWise minimum covariance determinant estimator

Description

The cellwise minimum covariance determinant estimator computes cellwise robust estimates of the center and covariance matrix of a data set X. The algorithm guarantees a monotone decrease of an objective function, which is based on observed Gaussian log-likelihood. By default, it starts by calling checkDataSet to clean the data.

Usage

cellMCD(X, alpha = 0.75, quant = 0.99,
        crit = 1e-4, noCits = 100, lmin = 1e-4,
        checkPars = list())

Arguments

`X`	`X` is the input data, and must be an `n` by `d` matrix or a data frame.
`alpha`	In each column, at least `n*alpha` cells must remain unflagged. Defaults to `75`%, should not be set (much) lower.
`quant`	Determines the cutoff value to flag cells. Defaults to `0.99`.
`crit`	The iteration stops when successive covariance matrices (of the standardized data) differ by less than `crit`. Defaults to `1e-4`.
`noCits`	The maximal number of C-steps used.
`lmin`	a lower bound on the eigenvalues of the estimated covariance matrix on the standardized data. Defaults to `1e-4`. Should not be smaller than `1e-6`.
`checkPars`	Optional list of parameters used in the call to `checkDataSet`. The options are: `coreOnly` If `TRUE`, skip the execution of checkDataset. Defaults to `FALSE`. `numDiscrete` A column that takes on numDiscrete or fewer values will be considered discrete and not retained in the cleaned data. Defaults to `5`. `fracNA` Only retain columns and rows with fewer NAs than this fraction. Defaults to `0.5`. `precScale` Only consider columns whose scale is larger than precScale. Here scale is measured by the median absolute deviation. Defaults to `1e-12`. `silent` Whether or not the function progress messages should be suppressed. Defaults to `FALSE`.

Details

The matrix raw.S in the output is the raw estimate of scatter produced by cellMCD. The final S is obtained from raw.S by rescaling such that its diagonal entries equal the squares of the univariate scales in locsca$scale. This reduces the bias at Gaussian data, which matters mainly for large sample sizes.

Value

A list with components:

mu
the cellMCD estimate of location.
S
the cellMCD estimate of scatter, after bias correction (see details).
W
the cellMCD estimate of W, a binary matrix indicating all outlying cells as zero.
preds
predictions (=conditional expectations) of the flagged cells, given the clean cells in the same row.
csds
conditional standard deviations of the flagged cells, given the clean cells in the same row.
Ximp
imputed data matrix.
Zres
matrix of cellwise standardized residuals.
raw.S
the raw cellMCD estimate of scatter, without bias correction.
locsca
list containing robust locations and scales used to standardize the data before running the algorithm. The results m, S, preds, Ximp are returned in their original location/scale.
nosteps
number of steps the algorithm took to converge.
X
the data on which the algorithm was executed.
quant
the cutoff used to flag the cells.

Author(s)

J. Raymaekers and P.J. Rousseeuw

References

J. Raymaekers and P.J. Rousseeuw (2022). The cellwise MCD estimator, Journal of the American Statistical Association, to appear. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/01621459.2023.2267777")}(link to open access pdf)

Examples

mu    <- rep(0, 3)
Sigma <- diag(3) * 0.5 + 0.5
set.seed(123)
X <- MASS::mvrnorm(1000, mu, Sigma)
X[1:5, 1]  <- X[1:5, 1] + 5
X[6:10, 2] <- X[6:10, 2] - 10
X[12, 1:2] <- c(-4,8)
colnames(X) <- c("X1","X2","X3")
cellMCD.out <- cellMCD(X)
cellMCD.out$mu
cov2cor(cellMCD.out$S)
cellMCD.out$W[1:15,]
cellMCD.out$Ximp[1:15,]
cellMap(cellMCD.out$Zres[1:15,])

# For more examples, we refer to the vignette:
## Not run: 
vignette("cellMCD_examples")

## End(Not run)

cellWise documentation built on Oct. 25, 2023, 5:07 p.m.