bayesProb: Mean posterior probability for each observation in Baysian... In quokar: Quantile Regression Outlier Diagnostics with K Left Out Analysis

Description

Mean posterior probability for each observation in Baysian quantile regression model

Usage

 1 bayesProb(y, x, tau, M, burn) 

Arguments

 y vector, dependent variable in quantile regression x matrix, design matrix in quantile regression tau quantile M MCMC draws burn burned MCMC draws

Details

If we define the variable O_i, which takes value equal to 1 when ith observation is an outlier, and 0 otherwise, then we propose to calculate the probability of an observation being an outlier as:

P(O_{i} = 1) = \frac{1}{n-1}∑{P(v_{i}>v_{j}|data)} \quad (1)

We believe that for points, which are not outliers, this probability should be small, possibly close to zero. Given the natrual ordering of the residuals, it is expected that some observations present greater values for this probability in comparison to others. What we think that should be deemed as an outlier, ought to be those observations with a higher P(O_{i} = 1), and possibly one that is particularly distant from the others.

The probability in the equation can be approximated given the MCMC draws, as follows

P(O_{i}=1)=\frac{1}{M}∑{I(v^{(l)}_{i}>max v^{k}_{j})}

where M is the size of the chain of v_{i} after the burn-in period and v^{(l)}_{j} is the lth draw of chain.

More details please refer to the paper in references

References

Santos B, Bolfarine H.(2016)“On Baysian quantile regression and outliers,arXiv:1601.07344

bayesKL
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ## Not run: ais_female <- subset(ais, Sex == 1) y <- ais_female$BMI x <- cbind(1, ais_female$LBM) tau <- 0.5 M <- 5000 burn <- 1000 prob <- bayesProb(y, x, tau, M, burn) case <- 1:100 dat <- data.frame(case, prob) ggplot(dat, aes(case, prob))+ geom_point() + geom_text(data = subset(dat, prob > mean(prob) + 2*sd(prob)), ## End(Not run)