R/frame_distance_complex.R

Defines functions frame_distance_complex

Documented in frame_distance_complex

globalVariables("tau_flag")
#' @title Residual-robust distance plot of quantile regression model
#' @description the standardized residuals from quantile regression
#' against the robust MCD distance. This display is used to diagnose
#' both vertical outlier and horizontal leverage points. Function
#' `frame_distance` only work for linear quantile regression model. With
#' non-linear model, use `frame_distance_complex`
#' @param x matrix, covariate of quantile regression model
#' @param resid matrix, residuals of quantile regression models
#' @param tau singular or vectors, quantile
#' @return dataframe for residual-robust distance plot
#' @details The generalized MCD algorithm based on the fast-MCD
#' algorithm formulated by Rousseeuw and Van Driessen(1999), which
#' is similar to the algorithm for least trimmed squares(LTS).
#' The canonical Mahalanobis distance is defined as
#' \deqn{MD(x_i)=[(x_i-\bar{x})^{T}\bar{C}(X)^{-1}(x_i-\bar{x})]^{1/2}}
#' where \eqn{\bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_i} and
#' \eqn{\bar{C}(X)=\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})^{T}(x_i-
#' \bar{x})} are the empirical multivariate location and scatter,
#' respectively. Here \eqn{x_i=(x_{i1},...,x_{ip})^{T}} exclueds the
#' intercept. The relation between the Mahalanobis distance
#' \eqn{MD(x_i)} and the hat matrix
#' \eqn{H=(h_{ij})=X(X^{T}X)^{-1}X^{T}} is
#' \deqn{h_{ii}=\frac{1}{n-1}MD^{2}_{i}+\frac{1}{n}}
#' The canonical robust distance is defined as
#' \deqn{RD(x_{i})=[(x_{i}-T(X))^{T}C(X)^{-1}(x_{i}-T(X))]^{1/2}}
#' where \eqn{T(X)} and \eqn{C(X)} are the robust multivariate
#' location and scatter, respectively, obtained by MCD.
#' To achieve robustness, the MCD algorithm estimates the covariance
#' of a multivariate data set mainly through as MCD \eqn{h}-point
#' subset of data set. This subset has the smallest sample-covariance
#' determinant among all the possible \eqn{h}-subsets. Accordingly,
#' the breakdown value for the MCD algorithm equals
#' \eqn{\frac{(n-h)}{n}}. This means the MCD estimates is reliable,
#' even if up to \eqn{\frac{100(n-h)}{n}}% observations in the data
#' set are contaminated.
#' @author Wenjing Wang \email{[email protected]}
#' @export
frame_distance_complex <- function(x,resid,tau){
  p <- ncol(x)
  n <- nrow(x)
  colnames(resid) <- paste("tau", tau, sep="")
  center_m <- apply(x, 2, mean)
  cov_m <- cov(x)
  md <- rep(0, n)
  for(i in 1:n){
    mid_m <- x[i,] - center_m
    mid_m <- as.matrix(mid_m)
    md[i] <- sqrt(matrix(mid_m, nrow = 1)%*%solve(cov_m)%*%matrix(mid_m, ncol= 1))
  }
  md <- matrix(md, ncol = 1)
  mcd <- covMcd(x)
  Tx <- mcd$center
  Cx <- mcd$cov
  rd <- rep(0, n)
  for(i in 1:n){
    mid_r <- x[i, ] - Tx
    mid_r <- as.matrix(mid_r)
    rd[i] <- sqrt(matrix(mid_r, nrow = 1)%*%solve(Cx)%*%matrix(mid_r,ncol = 1))
  }
  rd <- matrix(rd, ncol = 1)
  cutoff_v <- sqrt(qchisq(p = 0.975, df = p))
  cutoff_h <- rep(0, length(tau))
  for(i in 1:length(tau)){
    cutoff_h[i] <- median(abs(resid[,i])/qnorm(p=0.75, mean=0, sd = 1))
  }
  cutoff_h <- 5*cutoff_h
  rd_m <- data.frame(md, rd, resid)
  rd_f <- rd_m %>% gather(tau_flag, residuals, -md, -rd)
  return(list("Distance" = rd_f, "Vertical Cutoff" = cutoff_v,
              "Horizental Cutoff" = cutoff_h))
}

Try the quokar package in your browser

Any scripts or data that you put into this service are public.

quokar documentation built on Nov. 17, 2017, 6:20 a.m.