Description Usage Arguments Value References Examples
The co-median matrix is an alternative to the covariance matrix. To understand
how this works, first consider the definition of the median absolute deviation, MAD(x) = md(x-md(x)).
The MAD is usually scaled by a factor of 1.4826 to make it usable as a consistent robust estimator of the
standard deviation. Also offered as an option here is to replace the standard estimate of the median with
the Harrell-Davis estimator of the median, which can improve accuracy in smaller sample sizes (Harrell & Davis, 1982).
The co-median is defined by com(x,y) = med((x-med(x) * (y-med(y)))), and the standardized form analagous
to the correlation coefficient, δ = com(x,y)/(MAD(x) * MAD(y)). Note that δ
is not guaranteed to lie within the interval [-1, 1] like the correlation coefficient, however, but
typically only deviates from this interval for non-normally distributed random variables and is a
smooth function of the correlation coefficient (Falk, 1997; Falk, 1998).
A disadvantage of the median absolute deviation is that it can collapse to zero when half of the values
in a vector are the same. When a column with MAD=0 is detected, the function returns an error message.
Another disadvantage of the co-median matrix is that it is not guaranteed to be positive-semidefinite
even when n > p. To get around this problem this function implements an iterative algorithm proposed by
Sajesh and Srinivasan (2012), described below.
1. Let δ(X) be the co-median correlation matrix of X. Compute the eigenvalues and eigenvectors
of δ(X), and let E denote the eigenvectors, and Λ the diagonal matrix of
eigenvalues.
2. Let Q = DE, where D is a diagonal matrix of MADs. Let
invQ be the inverse of Q. Scores are then obtained as Z = XinvQ,
whose squared-MADs are stored in a diagonal matrix, Γ. Furthermore, denote the
vector of column medians of Z as γ.
3. The resulting robust estimates for location and scatter are then respectively defined as
Ω = QΓQ' and mu = Qγ.
4. Optional Step: Reiterate the above steps one or two times, but substituting Ω for δ and Γ for the sample MADs in D in the re-iterated steps.
1 |
x |
a data frame or matrix containing numeric variables |
method |
one of "med", "hd", or "aad". "med" uses the typical median and MAD. "hd" uses the Harrell-Davis estimate of the median in place of the median, and "aad" uses the average absolute deviation in lieu of the median absolute deviation. if option "aad" is used the appropriate consistency constant, sqrt(pi/2), is used instead of 1.4826. the only time "aad" is preferable is when there are columns in the data with a median absolute deviation of zero. |
iter |
number of refinement iterations |
alpha |
the chi-squared quantile for declaring an outlier in the final reweighted estimate. must be > 0.50. |
a covRobust object containing the following elements:
center: multivariate mean of cleaned data set after discarding outliers identified by the mahalanobis distances of the co-median matrix.
cov: covariance matrix of cleaned data set after discarding outliers identified by the mahalanobis distances of the co-median matrix.
medians: estimated multivariate median
com: estimated co-median matrix
delta: the initial raw comedian correlation matrix
dist: the mahalanobis distances based on the cleaned covariance matrix
distL1: the mahalanobis distances based on the co-median matrix
outliers: the indices of the outliers identified by the co-median matrix based mahalanobis distances; these are the points removed to obtain the cleaned covariance matrix.
weights: the weights for downweighting outliers. here they are binary, with 0 marking an outlier and 1 otherwise.
Falk, M. (1997) On MAD and comedians. Annals of the Institute of Statistical Mathematics 49, 615-644.
Falk, M. (1998). A Note on the Comedian for Elliptical Distributions. Journal of Multivariate Analysis, 67(2), 306-317. doi:10.1006/jmva.1998.1775
Harrell, F. E. & Davis, C. E. (1982). A new distribution-free quantile estimator. Biometrika, 69, 635–640
Sajesh, T. A., & Srinivasan, M. R. (2012). Outlier detection for high dimensional data using the Comedian approach. Journal of Statistical Computation and Simulation, 82(5), 745-757. doi:10.1080/00949655.2011.552504
1 | cov.comed(x)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.