bv.boxplot: Bivariate boxplots

View source: R/bv.boxplot.R

bv.boxplotR Documentation

Bivariate boxplots

Description

Creates diagnostic bivariate quelplot ellipses (bivariate boxplots) using the method of Goldberg and Iglewicz (1992). The output can be used to check assumptions of bivariate normality and to identify multivariate outliers. The default robust=TRUE option relies on on a biweight correlation estimator function written by Everitt (2006). Quelplots, are potentially asymmetric, although the method currently employed here uses a single "fence" definition and creates symmetric ellipses.

Usage

bv.boxplot(X, Y, robust = TRUE, D = 7, xlab = "X", ylab="Y", pch = 21, 
pch.out = NULL, bg = "gray", bg.out = NULL, hinge.col = 1, fence.col = 1, 
hinge.lty = 2, fence.lty = 3, xlim = NULL, ylim = NULL, names = 1:length(X), 
ID.out = FALSE, cex.ID.out = 0.7, uni.CI = FALSE, uni.conf = 0.95, 
uni.CI.col = 1, uni.CI.lty = 1, uni.CI.lwd = 2, show.points = TRUE, ...)

Arguments

X

First of two quantitative variables making up the bivariate distribution.

Y

Second of two quantitative variables making up the bivariate distribution.

robust

Logical. Robust estimators, i.e. robust = TRUE are recommended.

D

The default D = 7 lets the fence be equal to a 99 percent confidence interval for an individual observation.

xlab

Caption for X axis.

ylab

Caption for Y axis.

pch

Plotting character(s) for scatterplot.

pch.out

Plotting character for outliers.

hinge.col

Hinge color.

fence.col

Fence color.

hinge.lty

Hinge line type.

fence.lty

Fence line type.

xlim

A two element vector defining the X-limits of the plot.

ylim

The Y-limits of the plot.

bg

Background color for points in scatterplot, defaults to black if pch is not in the range 21:26.

bg.out

Background color for outlying points in scatterplot, defaults to black if pch is not in the range 21:26.

names

An optional vector of names for X, Y coordinates.

ID.out

Logical. Whether or not outlying points should be given labels (from argument name in plot.

cex.ID.out

Character expansion for outlying ID labels.

uni.CI

Logical. If true, univariate confidence intervals for the true median at confidence uni.CI are shown.

uni.conf

Univariate confidence, only used if CI.uni = TRUE.

uni.CI.col

Univariate confidence bound line color, only used if CI.uni = TRUE.

uni.CI.lty

Univariate confidence bound line type, only used if CI.uni = TRUE.

uni.CI.lwd

Univariate confidence bound line width, only used if CI.uni = TRUE.

show.points

Logical. Whether points should be shown in graph.

...

Additional arguments from points.

Details

Two ellipses are drawn. The inner is the "hinge" which contains 50 percent of the data. The outer is the "fence". Observations outside of the "fence" constitute possible troublesome outliers. The function bivariate from Everitt (2004) is used to calculate robust biweight measures of correlation, scale, and location if robust = TRUE (the default). We have the following form to the quelplot model:

E_i = \sqrt{\frac{X^2_{si} + Y^2_{si} - 2R^*X_{si}Y_{si}}{1-R^{*2}}}.

where X_{si} = (X_i - T^*_X)/S^*_X, and Y_{si} = (Y_i - T^*_X)/S^*_Y are standardized values for X_i and Y_i, respectively, T^*_X and T^*_Y are location estimators for X and Y, S^*_X and S^*_Y are scale estimators for X and Y, and R^* is a correlation estimator for X and Y. We have:

E_m = median\{E_i:i=1,2,...,n\},

and

E_{max} = max\{E_i: E_i^2 < DE^2_m\}.

where D is a constant that regulates the distance of the "fence" and "hinge".

To draw the "hinge" we have:

R_1 = E_m\sqrt{\frac{1 + R^*}{2}},

R_2 = E_m\sqrt{\frac{1 - R^*}{2}}.

To draw the "fence" we have:

R_1 = E_{max}\sqrt{\frac{1 + R^*}{2}},

R_2 = E_{max}\sqrt{\frac{1 - R^*}{2}}.

For \theta = 0 to 360, let:

\Theta_1 = R_1cos(\theta),

\Theta_2 = R_2sin(\theta).

The Cartesian coordinates of the "hinge" and "fence" are:

X=T^*_X=(\Theta_1+\Theta_2)S^*_X,

Y=T^*_Y=(\Theta_1-\Theta_2)S^*_Y.

Quelplots, are potentially asymmetric, although the current (and only) method used here defines a single value for E_{max} and hence creates symmetric ellipses. Under this implementation at least one point will define E_{max}, and lie on the "fence".

Value

A diagnostic plot is returned. Invisible objects from the function include location, scale and correlation estimates for X and Y, estimates for E_m and E_{max}, and a list of outliers (that exceed E_{max}).

Author(s)

Ken Aho, the function relies on an Everitt (2006) function for robust M-estimation.

References

Everitt, B. (2006) An R and S-plus Companion to Multivariate Analysis. Springer.

Goldberg, K. M., and B. Ingelwicz (1992) Bivariate extensions of the boxplot. Technometrics 34: 307-320.

See Also

boxplot

Examples

Y1<-rnorm(100, 17, 3)
Y2<-rnorm(100, 13, 2)
bv.boxplot(Y1, Y2)

X <- c(-0.24, 2.53, -0.3, -0.26, 0.021, 0.81, -0.85, -0.95, 1.0, 0.89, 0.59, 
0.61, -1.79, 0.60, -0.05, 0.39, -0.94, -0.89, -0.37, 0.18)
Y <- c(-0.83, -1.44, 0.33, -0.41, -1.0, 0.53, -0.72, 0.33,  0.27, -0.99, 0.15, 
-1.17, -0.61, 0.37, -0.96, 0.21, -1.29, 1.40, -0.21, 0.39)
b <- bv.boxplot(X, Y, ID.out = TRUE, bg.out = "red")
b

asbio documentation built on Aug. 20, 2023, 9:07 a.m.