# frame_distance: Residual-robust distance plot of quantile regression model In quokar: Quantile Regression Outlier Diagnostics with K Left Out Analysis

## Description

the standardized residuals from quantile regression against the robust MCD distance. This display is used to diagnose both vertical outlier and horizontal leverage points. Function frame_distance only work for linear quantile regression model. With non-linear model, use frame_distance_implement

## Usage

 1 frame_distance(object, tau) 

## Arguments

 object model, quantile regression model tau singular or vectors, quantile

## Details

The generalized MCD algorithm based on the fast-MCD algorithm formulated by Rousseeuw and Van Driessen(1999), which is similar to the algorithm for least trimmed squares(LTS). The canonical Mahalanobis distance is defined as

MD(x_i)=[(x_i-\bar{x})^{T}\bar{C}(X)^{-1}(x_i-\bar{x})]^{1/2}

where \bar{x}=\frac{1}{n}∑_{i=1}^{n}x_i and \bar{C}(X)=\frac{1}{n-1}∑_{i=1}^{n}(x_i-\bar{x})^{T}(x_i- \bar{x}) are the empirical multivariate location and scatter, respectively. Here x_i=(x_{i1},...,x_{ip})^{T} exclueds the intercept. The relation between the Mahalanobis distance MD(x_i) and the hat matrix H=(h_{ij})=X(X^{T}X)^{-1}X^{T} is

h_{ii}=\frac{1}{n-1}MD^{2}_{i}+\frac{1}{n}

The canonical robust distance is defined as

RD(x_{i})=[(x_{i}-T(X))^{T}C(X)^{-1}(x_{i}-T(X))]^{1/2}

where T(X) and C(X) are the robust multivariate location and scatter, respectively, obtained by MCD. To achieve robustness, the MCD algorithm estimates the covariance of a multivariate data set mainly through as MCD h-point subset of data set. This subset has the smallest sample-covariance determinant among all the possible h-subsets. Accordingly, the breakdown value for the MCD algorithm equals \frac{(n-h)}{n}. This means the MCD estimates is reliable, even if up to \frac{100(n-h)}{n} set are contaminated.

## Value

dataframe for residual-robust distance plot

## Author(s)

Wenjing Wang wenjingwangr@gmail.com

function frame_distance_complex
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 library(quantreg) library(ggplot2) library(ALDqr) library(purrr) library(robustbase) library(tidyr) library(gridExtra) tau = c(0.1, 0.5, 0.9) ais_female <- subset(ais, Sex == 1) object <- rq(BMI ~ LBM + Ht, data = ais_female, tau = tau) plot_distance <- frame_distance(object, tau = c(0.1, 0.5, 0.9)) distance <- plot_distance[[1]] cutoff_v <- plot_distance[[2]] cutoff_h <- plot_distance[[3]] n <- nrow(object$model) case <- rep(1:n, length(tau)) distance <- cbind(case, distance) distance$residuals <- abs(distance\$residuals) distance1 <- subset(distance, tau_flag == "tau0.1") p1 <- ggplot(distance1, aes(x = rd, y = residuals)) + geom_point() + geom_hline(yintercept = cutoff_h[1], colour = "red") + geom_vline(xintercept = cutoff_v, colour = "red") + geom_text(data = subset(distance1, residuals > cutoff_h[1]|rd > cutoff_v), aes(label = case), hjust = 0, vjust = 0) + xlab("Robust Distance") + ylab("|Residuals|") distance2 <- subset(distance, tau_flag == "tau0.5") p2 <- ggplot(distance1, aes(x = rd, y = residuals)) + geom_point() + geom_hline(yintercept = cutoff_h[2], colour = "red") + geom_vline(xintercept = cutoff_v, colour = "red") + geom_text(data = subset(distance1, residuals > cutoff_h[2]|rd > cutoff_v), aes(label = case), hjust = 0, vjust = 0) + xlab("Robust Distance") + ylab("|Residuals|") distance3 <- subset(distance, tau_flag == "tau0.9") p3 <- ggplot(distance1, aes(x = rd, y = residuals)) + geom_point() + geom_hline(yintercept = cutoff_h[3], colour = "red") + geom_vline(xintercept = cutoff_v, colour = "red") + geom_text(data = subset(distance1, residuals > cutoff_h[3]|rd > cutoff_v), aes(label = case), hjust = 0, vjust = 0) + xlab("Robust Distance") + ylab("|Residuals|") grid.arrange(p1, p2, p3, ncol = 3)