Description Usage Arguments Details Value Author(s) See Also Examples
View source: R/frame_distance.R
the standardized residuals from quantile regression
against the robust MCD distance. This display is used to diagnose
both vertical outlier and horizontal leverage points. Function
frame_distance
only work for linear quantile regression model. With
non-linear model, use frame_distance_implement
1 | frame_distance(object, tau)
|
object |
model, quantile regression model |
tau |
singular or vectors, quantile |
The generalized MCD algorithm based on the fast-MCD algorithm formulated by Rousseeuw and Van Driessen(1999), which is similar to the algorithm for least trimmed squares(LTS). The canonical Mahalanobis distance is defined as
MD(x_i)=[(x_i-\bar{x})^{T}\bar{C}(X)^{-1}(x_i-\bar{x})]^{1/2}
where \bar{x}=\frac{1}{n}∑_{i=1}^{n}x_i and \bar{C}(X)=\frac{1}{n-1}∑_{i=1}^{n}(x_i-\bar{x})^{T}(x_i- \bar{x}) are the empirical multivariate location and scatter, respectively. Here x_i=(x_{i1},...,x_{ip})^{T} exclueds the intercept. The relation between the Mahalanobis distance MD(x_i) and the hat matrix H=(h_{ij})=X(X^{T}X)^{-1}X^{T} is
h_{ii}=\frac{1}{n-1}MD^{2}_{i}+\frac{1}{n}
The canonical robust distance is defined as
RD(x_{i})=[(x_{i}-T(X))^{T}C(X)^{-1}(x_{i}-T(X))]^{1/2}
where T(X) and C(X) are the robust multivariate location and scatter, respectively, obtained by MCD. To achieve robustness, the MCD algorithm estimates the covariance of a multivariate data set mainly through as MCD h-point subset of data set. This subset has the smallest sample-covariance determinant among all the possible h-subsets. Accordingly, the breakdown value for the MCD algorithm equals \frac{(n-h)}{n}. This means the MCD estimates is reliable, even if up to \frac{100(n-h)}{n} set are contaminated.
dataframe for residual-robust distance plot
Wenjing Wang wenjingwangr@gmail.com
function frame_distance_complex
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | library(quantreg)
library(ggplot2)
library(ALDqr)
library(purrr)
library(robustbase)
library(tidyr)
library(gridExtra)
tau = c(0.1, 0.5, 0.9)
ais_female <- subset(ais, Sex == 1)
object <- rq(BMI ~ LBM + Ht, data = ais_female, tau = tau)
plot_distance <- frame_distance(object, tau = c(0.1, 0.5, 0.9))
distance <- plot_distance[[1]]
cutoff_v <- plot_distance[[2]]
cutoff_h <- plot_distance[[3]]
n <- nrow(object$model)
case <- rep(1:n, length(tau))
distance <- cbind(case, distance)
distance$residuals <- abs(distance$residuals)
distance1 <- subset(distance, tau_flag == "tau0.1")
p1 <- ggplot(distance1, aes(x = rd, y = residuals)) +
geom_point() +
geom_hline(yintercept = cutoff_h[1], colour = "red") +
geom_vline(xintercept = cutoff_v, colour = "red") +
geom_text(data = subset(distance1, residuals > cutoff_h[1]|rd > cutoff_v),
aes(label = case), hjust = 0, vjust = 0) +
xlab("Robust Distance") +
ylab("|Residuals|")
distance2 <- subset(distance, tau_flag == "tau0.5")
p2 <- ggplot(distance1, aes(x = rd, y = residuals)) +
geom_point() +
geom_hline(yintercept = cutoff_h[2], colour = "red") +
geom_vline(xintercept = cutoff_v, colour = "red") +
geom_text(data = subset(distance1, residuals > cutoff_h[2]|rd > cutoff_v),
aes(label = case), hjust = 0, vjust = 0) +
xlab("Robust Distance") +
ylab("|Residuals|")
distance3 <- subset(distance, tau_flag == "tau0.9")
p3 <- ggplot(distance1, aes(x = rd, y = residuals)) +
geom_point() +
geom_hline(yintercept = cutoff_h[3], colour = "red") +
geom_vline(xintercept = cutoff_v, colour = "red") +
geom_text(data = subset(distance1, residuals > cutoff_h[3]|rd > cutoff_v),
aes(label = case), hjust = 0, vjust = 0) +
xlab("Robust Distance") +
ylab("|Residuals|")
grid.arrange(p1, p2, p3, ncol = 3)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.