qrod_bayes: Outlier Dignostic for Quantile Regression Based on Bayesian...
In quokar: Quantile Regression Outlier Diagnostics with K Left Out Analysis

Description Usage Arguments Details Value Author(s) References See Also

This function cacluate the mean probability of posterior of Baysian quantile regression model with asymmetric laplace distribution

1	qrod_bayes(y, x, tau, M, burn, method = c("bayes.prob", "bayes.kl"))

`y`	dependent variable in quantile regression
`x`	matrix, design matrix for quantile regression. For quantile regression model with intercept, the firt column of x is 1.
`tau`	quantile
`M`	the iteration frequancy for MCMC used in Baysian Estimation
`burn`	burned MCMC draw
`method`	the diagnostic method for outlier detection

If we define the variable Oi, which takes value equal to 1 when ith observation is an outlier, and 0 otherwise, then we propose to calculate the probability of an observation being an outlier as:

P(O_{i} = 1) = \frac{1}{n-1}∑{P(v_{i}>v_{j}|data)} \quad (1)

We believe that for points, which are not outliers, this probability should be small, possibly close to zero. Given the natrual ordering of the residuals, it is expected that some observations present greater values for this probability in comparison to others. What we think that should be deemed as an outlier, ought to be those observations with a higher P(O_{i} = 1), and possibly one that is particularly distant from the others.

The probability in the equation can be approximated given the MCMC draws, as follows

P(O_{i}=1)=\frac{1}{M}∑{I(v^{(l)}_{i}>max v^{k}_{j})}

where M is the size of the chain of v_{i} after the burn-in period and v^{(l)}_{j} is the lth draw of chain.

Another proposal to address these differences between the posterior distributions from the distinct latent variables in the model, we suggest the use of the Kullback- Leibler divergence proposed by Kullback and Leibler(1951), as a more precise method of measuring the distance between those latent variables in the Bayesian quantile regression framework. In this posterior information, the divergence is defined as

K(f_{i}, f_{j}) = \int log(\frac{f_{i}(x)}{f_{j}{(x)}})f_{i}(x)dx

where f_{i} could be the posterior conditional distribution of v_{i} and f_{j} the poserior conditional distribution of v_{j}. Similar to the probability proposal in the previous subsection, we should average this divergence for one observation based on the distance from all others, i.e,

KL(f_{i})=\frac{1}{n-1}∑{K(f_{i}, f_{j})}

We expect that when an observation presents a higher value for this divergence, it should also present a high probability value of being an outlier. Based on the MCMC draws from the posterior of each latent vaiable, we estimate the densities using a normal kernel and we compute the integral using the trapezoidal rule.

Mean probability or Kullback-Leibler divergence for observations in Bayesian quantile regression model

Wenjing Wang wenjingwangr@gmail.com

Benites L E, Lachos V H, Vilca F E.(2015)“Case-Deletion Diagnostics for Quantile Regression Using the Asymmetric Laplace Distribution,arXiv preprint arXiv:1509.05099.

Hawkins D M, Bradu D, Kass G V.(1984)“Location of several outliers in multiple-regression data using elemental sets. Technometrics, 26(3), 197-208.

Koenker R, Bassett Jr G.(1978)“ Regression quantiles, Econometrica, 1, 33-50.

Santos B, Bolfarine H.(2016)“On Baysian quantile regression and outliers,arXiv:1601.07344

Kozumi H, Kobayashi G.(2011)“Gibbs sampling methods for Bayesian quantile regression,Journal of statistical computation and simulation, 81(11), 1565-1578.

qrod_mle

quokar documentation built on May 2, 2019, 6:39 a.m.