cov.comed: Co-Median Robust Covariance Matrix

Description Usage Arguments Value References Examples

View source: R/covariance.R

Description

The co-median matrix is an alternative to the covariance matrix. To understand how this works, first consider the definition of the median absolute deviation, MAD(x) = md(x-md(x)). The MAD is usually scaled by a factor of 1.4826 to make it usable as a consistent robust estimator of the standard deviation. Also offered as an option here is to replace the standard estimate of the median with the Harrell-Davis estimator of the median, which can improve accuracy in smaller sample sizes (Harrell & Davis, 1982).

The co-median is defined by com(x,y) = med((x-med(x) * (y-med(y)))), and the standardized form analagous to the correlation coefficient, δ = com(x,y)/(MAD(x) * MAD(y)). Note that δ is not guaranteed to lie within the interval [-1, 1] like the correlation coefficient, however, but typically only deviates from this interval for non-normally distributed random variables and is a smooth function of the correlation coefficient (Falk, 1997; Falk, 1998).

A disadvantage of the median absolute deviation is that it can collapse to zero when half of the values in a vector are the same. When a column with MAD=0 is detected, the function returns an error message. Another disadvantage of the co-median matrix is that it is not guaranteed to be positive-semidefinite even when n > p. To get around this problem this function implements an iterative algorithm proposed by Sajesh and Srinivasan (2012), described below.

1. Let δ(X) be the co-median correlation matrix of X. Compute the eigenvalues and eigenvectors of δ(X), and let E denote the eigenvectors, and Λ the diagonal matrix of eigenvalues.

2. Let Q = DE, where D is a diagonal matrix of MADs. Let invQ be the inverse of Q. Scores are then obtained as Z = XinvQ, whose squared-MADs are stored in a diagonal matrix, Γ. Furthermore, denote the vector of column medians of Z as γ.

3. The resulting robust estimates for location and scatter are then respectively defined as Ω = QΓQ' and mu = Qγ.

4. Optional Step: Reiterate the above steps one or two times, but substituting Ω for δ and Γ for the sample MADs in D in the re-iterated steps.

Usage

1
cov.comed(x, method = c("med", "hd", "aad"), iter = 1)

Arguments

x

a data frame or matrix containing numeric variables

method

one of "med", "hd", or "aad". "med" uses the typical median and MAD. "hd" uses the Harrell-Davis estimate of the median in place of the median, and "aad" uses the average absolute deviation in lieu of the median absolute deviation. if option "aad" is used the appropriate consistency constant, sqrt(pi/2), is used instead of 1.4826. the only time "aad" is preferable is when there are columns in the data with a median absolute deviation of zero.

iter

number of refinement iterations

alpha

the chi-squared quantile for declaring an outlier in the final reweighted estimate. must be > 0.50.

Value

a covRobust object containing the following elements:

References

Falk, M. (1997) On MAD and comedians. Annals of the Institute of Statistical Mathematics 49, 615-644.

Falk, M. (1998). A Note on the Comedian for Elliptical Distributions. Journal of Multivariate Analysis, 67(2), 306-317. doi:10.1006/jmva.1998.1775

Harrell, F. E. & Davis, C. E. (1982). A new distribution-free quantile estimator. Biometrika, 69, 635–640

Sajesh, T. A., & Srinivasan, M. R. (2012). Outlier detection for high dimensional data using the Comedian approach. Journal of Statistical Computation and Simulation, 82(5), 745-757. doi:10.1080/00949655.2011.552504

Examples

1

abnormally-distributed/cvreg documentation built on May 3, 2020, 3:45 p.m.