Description Usage Arguments Value References
Using a robust ρ function in multivariate estimation has a major drawback. As the dimensionality of the data matrix increases (variables) the rows of the matrix (observations) will receive increasingly uniform weights. Hence, as p grows in size, the robust multivariate estimate converges to the standard estimator. However, this is undesirable. While this would be a good thing if the relative effeciency to the MLE inreased with the number of observations, it is not desirable to occur as a result of increasing dimensionality - the robustness is lost. Worse is that severe outliers may receive larger weights than inliers and consequently induce bias to the estimation. David Rocke (1996) proposed a modified "translated" bisquare ρ function that adapts to the dimensionality of a data set to prevent the ARE from increasing to values infinitesimally close to one while also preserving the robustness and unbiasedness of the estimation.
The algorithm is initialized here by using BACON (Blocked Adaptive Computationally-Efficient Outlier Nominator) algorithm of Billor, Hadi, and Velleman (2000).
1 |
X |
a data frame or matrix of numeric covariates. |
phi |
the factor determining the minimum number of observations not declared outliers following the initial estimation stage. the number of points not declared outliers will be at least phi * p. the default is 2. put differently, the number of observations declared outliers in the initial step will be at most n - phi*p. however, this has only an indirect effect on the final points declared outliers. |
q |
a tuning parameter. defaults to 2. recommended to not change unless you know what you are doing. |
maxit |
maximum number of iterations |
maxsteps |
limit on the number of steps for the line search section of the algorithm. |
tol |
numeric tolerance for convergence. |
a covRobust object containing the following elements:
center: multivariate mean of cleaned data set after applying casewise weights.
cov: covariance matrix of cleaned data set after applying casewise weights.
dist: the mahalanobis distances used in calculating the weights.
outliers: the indices of the outliers identified.
weights: the weights for downweighting outliers.
Rocke, D. M. (1996). Robustness properties of S-estimators of multivariate location and shape in high dimension. Annals of Statistics, 24, 1327–1345.
Rocke, D. M., & Woodruff, D. L. (1996). Identification of outliers in multivariate data. Journal of the American Statistical Association, 91, 1047–1061.
Billor, N., Hadi, A. S., & Velleman , P. F. (2000). BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators; Computational Statistics and Data Analysis 34, 279–298. doi: 10.1016/S0167-9473(99)00101-2
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.