adjbox: Plot an Adjusted Boxplot for Skew Distributions

View source: R/adjbox.R

adjboxR Documentation

Plot an Adjusted Boxplot for Skew Distributions

Description

Produces boxplots adjusted for skewed distributions as proposed in Hubert and Vandervieren (2008).

Usage

adjbox(x, ...)

## S3 method for class 'formula'
adjbox(formula, data = NULL, ..., subset, na.action = NULL)

## Default S3 method:
adjbox(x, ..., range = 1.5, doReflect = FALSE,
        width = NULL, varwidth = FALSE,
        notch = FALSE, outline = TRUE, names, plot = TRUE,
        border = par("fg"), col = NULL, log = "",
        pars = list(boxwex = 0.8, staplewex = 0.5, outwex = 0.5),
        horizontal = FALSE, add = FALSE, at = NULL)

Arguments

formula

a formula, such as y ~ grp, where y is a numeric vector of data values to be split into groups according to the grouping variable grp (usually a factor).

data

a data.frame (or list) from which the variables in formula should be taken.

subset

an optional vector specifying a subset of observations to be used for plotting.

na.action

a function which indicates what should happen when the data contain NAs. The default is to ignore missing values in either the response or the group.

x

for specifying data from which the boxplots are to be produced. Either a numeric vector, or a single list containing such vectors. Additional unnamed arguments specify further data as separate vectors (each corresponding to a component boxplot). NAs are allowed in the data.

...

For the formula method, named arguments to be passed to the default method.

For the default method, unnamed arguments are additional data vectors (unless x is a list when they are ignored), and named arguments are arguments and graphical parameters to be passed to bxp in addition to the ones given by argument pars (and override those in pars).

range

this determines how far the plot whiskers extend out from the box, and is simply passed as argument coef to adjboxStats(). If range is positive, the whiskers extend to the most extreme data point which is no more than range times the interquartile range from the box. A value of zero causes the whiskers to extend to the data extremes.

doReflect

logical indicating if the MC should also be computed on the reflected sample -x, and be averaged, see mc.

width

a vector giving the relative widths of the boxes making up the plot.

varwidth

if varwidth is TRUE, the boxes are drawn with widths proportional to the square-roots of the number of observations in the groups.

notch

if notch is TRUE, a notch is drawn in each side of the boxes. If the notches of two plots do not overlap this is ‘strong evidence’ that the two medians differ (Chambers et al., 1983, p. 62). See boxplot.stats for the calculations used.

outline

if outline is not true, the outliers are not drawn (as points whereas S+ uses lines).

names

group labels which will be printed under each boxplot.

boxwex

a scale factor to be applied to all boxes. When there are only a few groups, the appearance of the plot can be improved by making the boxes narrower.

staplewex

staple line width expansion, proportional to box width.

outwex

outlier line width expansion, proportional to box width.

plot

if TRUE (the default) then a boxplot is produced. If not, the summaries which the boxplots are based on are returned.

border

an optional vector of colors for the outlines of the boxplots. The values in border are recycled if the length of border is less than the number of plots.

col

if col is non-null it is assumed to contain colors to be used to colour the bodies of the box plots. By default they are in the background colour.

log

character indicating if x or y or both coordinates should be plotted in log scale.

pars

a list of (potentially many) more graphical parameters, e.g., boxwex or outpch; these are passed to bxp (if plot is true); for details, see there.

horizontal

logical indicating if the boxplots should be horizontal; default FALSE means vertical boxes.

add

logical, if true add boxplot to current plot.

at

numeric vector giving the locations where the boxplots should be drawn, particularly when add = TRUE; defaults to 1:n where n is the number of boxes.

Details

The generic function adjbox currently has a default method (adjbox.default) and a formula interface (adjbox.formula).

If multiple groups are supplied either as multiple arguments or via a formula, parallel boxplots will be plotted, in the order of the arguments or the order of the levels of the factor (see factor).

Missing values are ignored when forming boxplots.

Extremes of the upper and whiskers of the adjusted boxplots are computed using the medcouple (mc()), a robust measure of skewness. For details, cf. TODO

Value

A list with the following components:

stats

a matrix, each column contains the extreme of the lower whisker, the lower hinge, the median, the upper hinge and the extreme of the upper whisker for one group/plot. If all the inputs have the same class attribute, so will this component.

n

a vector with the number of observations in each group.

coef

a matrix where each column contains the lower and upper extremes of the notch.

out

the values of any data points which lie beyond the extremes of the whiskers.

group

a vector of the same length as out whose elements indicate to which group the outlier belongs.

names

a vector of names for the groups.

Note

The code and documentation only slightly modifies the code of boxplot.default, boxplot.formula and boxplot.stats

Author(s)

R Core Development Team, slightly adapted by Tobias Verbeke

References

Hubert, M. and Vandervieren, E. (2008). An adjusted boxplot for skewed distributions, Computational Statistics and Data Analysis 52, 5186–5201. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.csda.2007.11.008")}

See Also

The medcouple, mc; boxplot.

Examples

if(require("boot")) {
 ### Hubert and Vandervieren (2008), Fig. 5.%(2006): p. 10, Fig. 4.
 data(coal, package = "boot")
 coaldiff <- diff(coal$date)
 op <- par(mfrow = c(1,2))
 boxplot(coaldiff, main = "Original Boxplot")
 adjbox(coaldiff, main  = "Adjusted Boxplot")
 par(op)
}

### Hubert and Vandervieren (2008), p. 11, Fig. 7a -- enhanced
op <- par(mfrow = c(2,2), mar = c(1,3,3,1), oma = c(0,0,3,0))
with(condroz, {
 boxplot(Ca, main = "Original Boxplot")
 adjbox (Ca, main = "Adjusted Boxplot")
 boxplot(Ca, main = "Original Boxplot [log]", log = "y")
 adjbox (Ca, main = "Adjusted Boxplot [log]", log = "y")
})
mtext("'Ca' from data(condroz)",
      outer=TRUE, font = par("font.main"), cex = 2)
par(op)

robustbase documentation built on Nov. 1, 2024, 3 p.m.