frame_distance: Residual-robust distance plot of quantile regression model

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/frame_distance.R

Description

the standardized residuals from quantile regression against the robust MCD distance. This display is used to diagnose both vertical outlier and horizontal leverage points. Function frame_distance only work for linear quantile regression model. With non-linear model, use frame_distance_implement

Usage

1
frame_distance(object, tau)

Arguments

object

model, quantile regression model

tau

singular or vectors, quantile

Details

The generalized MCD algorithm based on the fast-MCD algorithm formulated by Rousseeuw and Van Driessen(1999), which is similar to the algorithm for least trimmed squares(LTS). The canonical Mahalanobis distance is defined as

MD(x_i)=[(x_i-\bar{x})^{T}\bar{C}(X)^{-1}(x_i-\bar{x})]^{1/2}

where \bar{x}=\frac{1}{n}∑_{i=1}^{n}x_i and \bar{C}(X)=\frac{1}{n-1}∑_{i=1}^{n}(x_i-\bar{x})^{T}(x_i- \bar{x}) are the empirical multivariate location and scatter, respectively. Here x_i=(x_{i1},...,x_{ip})^{T} exclueds the intercept. The relation between the Mahalanobis distance MD(x_i) and the hat matrix H=(h_{ij})=X(X^{T}X)^{-1}X^{T} is

h_{ii}=\frac{1}{n-1}MD^{2}_{i}+\frac{1}{n}

The canonical robust distance is defined as

RD(x_{i})=[(x_{i}-T(X))^{T}C(X)^{-1}(x_{i}-T(X))]^{1/2}

where T(X) and C(X) are the robust multivariate location and scatter, respectively, obtained by MCD. To achieve robustness, the MCD algorithm estimates the covariance of a multivariate data set mainly through as MCD h-point subset of data set. This subset has the smallest sample-covariance determinant among all the possible h-subsets. Accordingly, the breakdown value for the MCD algorithm equals \frac{(n-h)}{n}. This means the MCD estimates is reliable, even if up to \frac{100(n-h)}{n} set are contaminated.

Value

dataframe for residual-robust distance plot

Author(s)

Wenjing Wang wenjingwangr@gmail.com

See Also

function frame_distance_complex

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
library(quantreg)
library(ggplot2)
library(ALDqr)
library(purrr)
library(robustbase)
library(tidyr)
library(gridExtra)
tau = c(0.1, 0.5, 0.9)
ais_female <- subset(ais, Sex == 1)
object <- rq(BMI ~ LBM + Ht, data = ais_female, tau = tau)
plot_distance <- frame_distance(object, tau = c(0.1, 0.5, 0.9))
distance <- plot_distance[[1]]
cutoff_v <- plot_distance[[2]]
cutoff_h <- plot_distance[[3]]
n <- nrow(object$model)
case <- rep(1:n, length(tau))
distance <- cbind(case, distance)
distance$residuals <- abs(distance$residuals)
distance1 <- subset(distance, tau_flag == "tau0.1")
p1 <- ggplot(distance1, aes(x = rd, y = residuals)) +
 geom_point() +
 geom_hline(yintercept = cutoff_h[1], colour = "red") +
 geom_vline(xintercept = cutoff_v, colour = "red") +
 geom_text(data = subset(distance1, residuals > cutoff_h[1]|rd > cutoff_v),
           aes(label = case), hjust = 0, vjust = 0) +
 xlab("Robust Distance") +
 ylab("|Residuals|")

distance2 <- subset(distance, tau_flag == "tau0.5")

p2 <- ggplot(distance1, aes(x = rd, y = residuals)) +
 geom_point() +
 geom_hline(yintercept = cutoff_h[2], colour = "red") +
 geom_vline(xintercept = cutoff_v, colour = "red") +
 geom_text(data = subset(distance1, residuals > cutoff_h[2]|rd > cutoff_v),
          aes(label = case), hjust = 0, vjust = 0) +
 xlab("Robust Distance") +
 ylab("|Residuals|")
distance3 <- subset(distance, tau_flag == "tau0.9")

p3 <- ggplot(distance1, aes(x = rd, y = residuals)) +
 geom_point() +
 geom_hline(yintercept = cutoff_h[3], colour = "red") +
 geom_vline(xintercept = cutoff_v, colour = "red") +
 geom_text(data = subset(distance1, residuals > cutoff_h[3]|rd > cutoff_v),
         aes(label = case), hjust = 0, vjust = 0) +
xlab("Robust Distance") +
 ylab("|Residuals|")
grid.arrange(p1, p2, p3, ncol = 3)

quokar documentation built on May 2, 2019, 6:39 a.m.