Transformed rank correlations for multivariate outlier detection

Share:

Description

TRC starts from bivariate Spearman correlations and obtains a positive definite covariance matrix by back-transforming robust univariate medians and mads of the eigenspace. TRC can cope with missing values by a regression imputation using the a robust regression on the best predictor and it takes sampling weights into account.

Usage

1
2
TRC(data, weights, overlap = 3, mincor = 0, robust.regression = "rank", 
gamma = 0.5, prob.quantile = 0.75, alpha = 0.05, md.type = "m", monitor = FALSE)

Arguments

data

a data frame or matrix with the data

weights

sampling weights

overlap

minimum number of jointly observed values for calculating the rank correlation

mincor

minimal absolute correlation to impute

robust.regression

type of regression: "irls" is iteratively reweighted least squares M-estimator, "rank" is based on the rank correlations

gamma

minimal number of jointly observed values to impute

prob.quantile

if mads are 0 try this quantile of absolute deviations

alpha

(1-alpha) Quantile of F-distribution is used for cut-off

md.type

Type of Mahalanobis distance when missing values occur: "m" marginal (default), "c" conditional

monitor

if TRUE verbose output

Details

TRC is similar to a one-step OGK estimator where the starting covariances are obtained from rank correlations and an ad hoc missing value imputation plus weighting is provided.

Value

TRC returns a list whose first component output is a sublist with the following components:

sample.size

number of observations

number.of.variables

number of variables

number.of.missing.items

number of missing values

significance.level

1-alpha

computation.time

elapsed computation time

medians

componentwise medians

mads

componentwise mads

center

location estimate

scatter

covariance estimate

robust.regression

input parameter

md.type

input parameter

cutpoint

The default threshold MD-value for the cut-off of outliers

The further components returned by TRC are:

outind

Indicator of outliers

dist

Mahalanobis distances (with missing values)

Author(s)

Beat Hulliger

References

B\'eguin, C., and Hulliger, B. (2004). Multivariate oulier detection in incomplete survey data: The epidemic algorithm and transformed rank correlations. Journal of the Royal Statistical Society, A 167(Part 2.), 275-294.

Examples

1
2
3
4