bv.boxplot | R Documentation |
Creates diagnostic bivariate quelplot ellipses (bivariate boxplots) using the method of Goldberg and Iglewicz (1992).
The output can be used to check assumptions of bivariate normality and to identify multivariate outliers. The default robust=TRUE
option relies on on a biweight correlation estimator function written by Everitt (2006). Quelplots,
are potentially asymmetric, although the method currently employed here uses a
single "fence" definition and creates symmetric ellipses.
bv.boxplot(X, Y, robust = TRUE, D = 7, xlab = "X", ylab="Y", pch = 21,
pch.out = NULL, bg = "gray", bg.out = NULL, hinge.col = 1, fence.col = 1,
hinge.lty = 2, fence.lty = 3, xlim = NULL, ylim = NULL, names = 1:length(X),
ID.out = FALSE, cex.ID.out = 0.7, uni.CI = FALSE, uni.conf = 0.95,
uni.CI.col = 1, uni.CI.lty = 1, uni.CI.lwd = 2, show.points = TRUE, ...)
X |
First of two quantitative variables making up the bivariate distribution. |
Y |
Second of two quantitative variables making up the bivariate distribution. |
robust |
Logical. Robust estimators, i.e. |
D |
The default |
xlab |
Caption for X axis. |
ylab |
Caption for Y axis. |
pch |
Plotting character(s) for scatterplot. |
pch.out |
Plotting character for outliers. |
hinge.col |
Hinge color. |
fence.col |
Fence color. |
hinge.lty |
Hinge line type. |
fence.lty |
Fence line type. |
xlim |
A two element vector defining the X-limits of the plot. |
ylim |
The Y-limits of the plot. |
bg |
Background color for points in scatterplot, defaults to black if |
bg.out |
Background color for outlying points in scatterplot, defaults to black if |
names |
An optional vector of names for X, Y coordinates. |
ID.out |
Logical. Whether or not outlying points should be given labels (from argument |
cex.ID.out |
Character expansion for outlying ID labels. |
uni.CI |
Logical. If true, univariate confidence intervals for the true median at confidence |
uni.conf |
Univariate confidence, only used if |
uni.CI.col |
Univariate confidence bound line color, only used if |
uni.CI.lty |
Univariate confidence bound line type, only used if |
uni.CI.lwd |
Univariate confidence bound line width, only used if |
show.points |
Logical. Whether points should be shown in graph. |
... |
Additional arguments from |
Two ellipses are drawn. The inner is the "hinge" which contains 50 percent of the data. The outer is the "fence".
Observations outside of the "fence" constitute possible troublesome outliers.
The function bivariate
from Everitt (2004) is used to calculate robust biweight measures of correlation, scale, and location if robust = TRUE
(the default).
We have the following form to the quelplot model:
E_i =
\sqrt{\frac{X^2_{si} + Y^2_{si} - 2R^*X_{si}Y_{si}}{1-R^{*2}}}.
where X_{si} = (X_i - T^*_X)/S^*_X
, and Y_{si} = (Y_i - T^*_X)/S^*_Y
are standardized values for X_i
and Y_i
, respectively,
T^*_X
and T^*_Y
are location estimators for X and Y, S^*_X
and S^*_Y
are scale estimators for
X and Y, and R^*
is a correlation estimator for X and Y. We have:
E_m = median\{E_i:i=1,2,...,n\},
and
E_{max} = max\{E_i: E_i^2 < DE^2_m\}.
where D
is a constant that regulates the distance of the "fence" and "hinge".
To draw the "hinge" we have:
R_1 = E_m\sqrt{\frac{1 + R^*}{2}},
R_2 = E_m\sqrt{\frac{1 - R^*}{2}}.
To draw the "fence" we have:
R_1 = E_{max}\sqrt{\frac{1 + R^*}{2}},
R_2 = E_{max}\sqrt{\frac{1 - R^*}{2}}.
For \theta
= 0 to 360, let:
\Theta_1 = R_1cos(\theta),
\Theta_2 = R_2sin(\theta).
The Cartesian coordinates of the "hinge" and "fence" are:
X=T^*_X=(\Theta_1+\Theta_2)S^*_X,
Y=T^*_Y=(\Theta_1-\Theta_2)S^*_Y.
Quelplots, are potentially asymmetric, although the current (and only) method used here defines a single value for E_{max}
and hence creates symmetric ellipses. Under this implementation at least one point will define E_{max}
,
and lie on the "fence".
A diagnostic plot is returned. Invisible objects from the function include location, scale and correlation estimates for X
and Y
,
estimates for E_m
and E_{max}
, and a list of outliers (that exceed E_{max}
).
Ken Aho, the function relies on an Everitt (2006) function for robust M-estimation.
Everitt, B. (2006) An R and S-plus Companion to Multivariate Analysis. Springer.
Goldberg, K. M., and B. Ingelwicz (1992) Bivariate extensions of the boxplot. Technometrics 34: 307-320.
boxplot
Y1<-rnorm(100, 17, 3)
Y2<-rnorm(100, 13, 2)
bv.boxplot(Y1, Y2)
X <- c(-0.24, 2.53, -0.3, -0.26, 0.021, 0.81, -0.85, -0.95, 1.0, 0.89, 0.59,
0.61, -1.79, 0.60, -0.05, 0.39, -0.94, -0.89, -0.37, 0.18)
Y <- c(-0.83, -1.44, 0.33, -0.41, -1.0, 0.53, -0.72, 0.33, 0.27, -0.99, 0.15,
-1.17, -0.61, 0.37, -0.96, 0.21, -1.29, 1.40, -0.21, 0.39)
b <- bv.boxplot(X, Y, ID.out = TRUE, bg.out = "red")
b
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.