cov.comed: Co-Median Robust Covariance Matrix
In abnormally-distributed/cvreg: Cross Validation and Robust Estimation Utilities

Description Usage Arguments Value References Examples

The co-median matrix is an alternative to the covariance matrix. To understand how this works, first consider the definition of the median absolute deviation, MAD(x) = md(x-md(x)). The MAD is usually scaled by a factor of 1.4826 to make it usable as a consistent robust estimator of the standard deviation. Also offered as an option here is to replace the standard estimate of the median with the Harrell-Davis estimator of the median, which can improve accuracy in smaller sample sizes (Harrell & Davis, 1982).

The co-median is defined by com(x,y) = med((x-med(x) * (y-med(y)))), and the standardized form analagous to the correlation coefficient, δ = com(x,y)/(MAD(x) * MAD(y)). Note that δ is not guaranteed to lie within the interval [-1, 1] like the correlation coefficient, however, but typically only deviates from this interval for non-normally distributed random variables and is a smooth function of the correlation coefficient (Falk, 1997; Falk, 1998).

A disadvantage of the median absolute deviation is that it can collapse to zero when half of the values in a vector are the same. When a column with MAD=0 is detected, the function returns an error message. Another disadvantage of the co-median matrix is that it is not guaranteed to be positive-semidefinite even when n > p. To get around this problem this function implements an iterative algorithm proposed by Sajesh and Srinivasan (2012), described below.

1. Let δ(X) be the co-median correlation matrix of X. Compute the eigenvalues and eigenvectors of δ(X), and let E denote the eigenvectors, and Λ the diagonal matrix of eigenvalues.

2. Let Q = DE, where D is a diagonal matrix of MADs. Let invQ be the inverse of Q. Scores are then obtained as Z = XinvQ, whose squared-MADs are stored in a diagonal matrix, Γ. Furthermore, denote the vector of column medians of Z as γ.

3. The resulting robust estimates for location and scatter are then respectively defined as Ω = QΓQ' and mu = Qγ.

4. Optional Step: Reiterate the above steps one or two times, but substituting Ω for δ and Γ for the sample MADs in D in the re-iterated steps.

1	cov.comed(x, method = c("med", "hd", "aad"), iter = 1)

`x`	a data frame or matrix containing numeric variables
`method`	one of "med", "hd", or "aad". "med" uses the typical median and MAD. "hd" uses the Harrell-Davis estimate of the median in place of the median, and "aad" uses the average absolute deviation in lieu of the median absolute deviation. if option "aad" is used the appropriate consistency constant, sqrt(pi/2), is used instead of 1.4826. the only time "aad" is preferable is when there are columns in the data with a median absolute deviation of zero.
`iter`	number of refinement iterations
`alpha`	the chi-squared quantile for declaring an outlier in the final reweighted estimate. must be > 0.50.

a covRobust object containing the following elements:

center: multivariate mean of cleaned data set after discarding outliers identified by the mahalanobis distances of the co-median matrix.
cov: covariance matrix of cleaned data set after discarding outliers identified by the mahalanobis distances of the co-median matrix.
medians: estimated multivariate median
com: estimated co-median matrix
delta: the initial raw comedian correlation matrix
dist: the mahalanobis distances based on the cleaned covariance matrix
distL1: the mahalanobis distances based on the co-median matrix
outliers: the indices of the outliers identified by the co-median matrix based mahalanobis distances; these are the points removed to obtain the cleaned covariance matrix.
weights: the weights for downweighting outliers. here they are binary, with 0 marking an outlier and 1 otherwise.

Falk, M. (1997) On MAD and comedians. Annals of the Institute of Statistical Mathematics 49, 615-644.

Falk, M. (1998). A Note on the Comedian for Elliptical Distributions. Journal of Multivariate Analysis, 67(2), 306-317. doi:10.1006/jmva.1998.1775

Harrell, F. E. & Davis, C. E. (1982). A new distribution-free quantile estimator. Biometrika, 69, 635–640

Sajesh, T. A., & Srinivasan, M. R. (2012). Outlier detection for high dimensional data using the Comedian approach. Journal of Statistical Computation and Simulation, 82(5), 745-757. doi:10.1080/00949655.2011.552504