bixplot: Boxplot version suited for bimodal and multimodal data,...

View source: R/bixplot.R

bixplotR Documentation

Boxplot version suited for bimodal and multimodal data, combining density, box, and rug elements with automatic cluster detection

Description

Draws a bixplot for one or more numeric variables. A bixplot extends the violin plot and boxplot by automatically testing each variable for unimodality (via Hartigan's dip test) and, when multimodality is detected, fitting a constrained k-medoids clustering to identify and separately display the modes. Each variable is rendered as a filled density body, a box-and-whisker summary, and a rug of individual data values. The rug can optionally be colored by an external numeric or factor variable. Input can be provided as vectors, a data frame, a matrix, a list of vectors, or a formula.

Usage

bixplot(...,
        names = NA,
        add = FALSE,
        at = NULL,
        horizontal = FALSE,
        col = "gray60", bodyCol = NULL,
        bodyOpaque = 0.5,
        bodyW = 0.80,
        bodysize = "area_from_count",
        modeCol = c("cadetblue3", "hotpink2",
                    "lawngreen", "orange", "cyan3",
                    "coral2", "gray60", "darkgoldenrod2",
                    "slateblue2", "brown2", "khaki3"),
        curveCol = "black",
        curveLwd = 1,
        border = "black", boxCol = NULL,
        boxOpaque = 0.5,
        boxW = 0.32,
        boxLwd = 2,
        makewhiskers = TRUE,
        innerwhiskers = TRUE,
        rugCol = "black",
        rugoutCol = par("fg"),
        rugNumeric = NULL,
        rugNumericColors = c("blue", "green"),
        colorbarW = 0.12,
        rugFactor = NULL,
        rugFactorColors = c("red", "blue", "forestgreen"),
        rugLwd = par("lwd"),
        rugoutLwd = 2 * rugLwd,
        rugOpaque = 0.4,
        jittering = TRUE,
        jitteramount = NULL,
        rugW = 0.12,
        stickCol = "black",
        stickLwd = 2,
        boxwex = 1,
        width = NULL,
        side = "no",
        tick = TRUE,
        diplevel = 0.01,
        minN = 15,
        kmax = NULL,
        bigN = 500,
        clusMinN = 3,
        bw = "SJ-dpi",
        kernel = "gaussian",
        cut = 3,
        cutmin = -Inf,
        cutmax = Inf,
        xlim = NULL,
        ylim = NULL,
        cex.axis = 1,
        main = "bixplot",
        cex.main = 1, line.main = 1,
        cex.colorbar = 1,
        xlab = "",
        ylab = "",
        cex.lab = 1, line.lab = 2.2,
        xaxs = "r", yaxs = "r", las = 1,
        ann = !add,
        plot = TRUE,
        log = NULL)

Arguments

...

one or more numeric vectors to plot, or a single data frame or matrix whose columns are plotted, or a single list of numeric vectors (which may have different lengths), or a formula such as y ~ grp where y is a numeric vector to be split into groups by the grouping variable grp (usually a factor). Note that ~ g1 + g2 is equivalent to ~ g1:g2. When a formula is used it may be necessary to also supply a data = argument. Missing values are silently removed.

names

character vector of group labels printed next to each variable. If NA (the default), labels are extracted from the input data; if the data carry no names the integers 1, 2, 3, ... are used. [Argument from boxplot.]

add

logical. If TRUE, the bixplot is added to the current plot without drawing new axes. Defaults to FALSE. [Argument from boxplot.]

at

numeric vector specifying the axis positions at which the bixplots are drawn, particularly useful when add = TRUE. Defaults to 1:p where p is the number of variables. [Argument from boxplot.]

horizontal

logical. If TRUE, bixplots are drawn horizontally. Defaults to FALSE (vertical). [Argument from boxplot.]

col

color(s) for the density body of variables deemed unimodal. Can be a single color or a vector recycled to length p. When NA, the body is not filled. [Argument from boxplot.]

bodyCol

if not NULL, overrides col for the body fill color.

bodyOpaque

numeric between 0 (transparent) and 1 (opaque) controlling the opacity of the filled density body. Defaults to 0.5.

bodyW

numeric vector (recycled to length p) giving the width of the widest density body for each variable. When two or more modes are shown for a variable, this is the width of the widest mode body. Set to zero to suppress the body entirely. Defaults to 0.80.

bodysize

character string determining how the density bodies of individual modes within the same variable are sized relative to each other. One of "area_from_count" (the default; body area proportional to cluster membership count), "area_is_constant" (all modes have equal area), or "width_is_constant" (all modes have equal width). In all cases the widest body has width bodyW[j], possibly scaled by width.

modeCol

color(s) of the density bodies for variables with more than one detected mode (cluster). Cycled across all modes of all multimodal variables. If NULL, each mode inherits the col color of its variable.

curveCol

color(s) of the density curve boundary drawn around each body. Can be a single color or a vector recycled to length p. Set to NA to suppress the curve. Defaults to "black".

curveLwd

single number giving the line width of the density curve.

border

color(s) of the box and whiskers. Can be a single color or a vector recycled to length p. Set to NA to suppress the box. [Argument from boxplot.]

boxCol

if not NULL, overrides border for the box color.

boxOpaque

numeric between 0 and 1 controlling the opacity of the lines making up the boxplot. Defaults to 0.5.

boxW

numeric vector (recycled to length p) giving the width of the interquartile box for each variable. Set to zero to suppress the box. Defaults to 0.32.

boxLwd

single number giving the line width of the box and whiskers.

makewhiskers

logical. If TRUE (the default), whiskers are drawn extending from the box to the most extreme non-outlying values.

innerwhiskers

logical. Relevant only when a variable has more than one mode. If TRUE, whiskers are also drawn between adjacent modes. If FALSE, inner whiskers are omitted because their interpretation is ambiguous when mode densities overlap.

rugCol

color(s) of the rug tick marks for each variable. Can be a single color or a vector recycled to length p. Set to NA to suppress the rug. Defaults to "black".

rugoutCol

color(s) of the portion of rug lines that extends outside the density body. If NULL or NA, the same color as rugCol is used throughout. Defaults to par("fg").

rugNumeric

optional numeric vector of the same length as each y variable (requires all variables to have the same length) used to color the rug lines via a continuous color palette. Cannot be combined with rugFactor.

rugNumericColors

vector of two or three colors passed to colorRampPalette to construct the palette for rugNumeric. Defaults to c("blue", "green").

colorbarW

width of the color bar legend for rugNumeric relative to the main plot width. Only has an effect when rugNumeric is specified. Defaults to 0.12.

rugFactor

optional factor variable of the same length as each y variable used to color the rug lines by factor level. Cannot be combined with rugNumeric.

rugFactorColors

character vector of colors for the levels of rugFactor, recycled as needed. Defaults to c("red", "blue", "forestgreen").

rugLwd

single number giving the line width of the rug marks. Defaults to par("lwd").

rugoutLwd

single number giving the line width of the part of rug lines outside the density body. Defaults to 2 * rugLwd to make isolated points visually prominent.

rugOpaque

numeric between 0 and 1 controlling the opacity of the rug marks. Defaults to 0.4.

jittering

logical. If TRUE (the default), rug tick positions are jittered via jitter to make tied values distinguishable.

jitteramount

amount of jittering passed to jitter. If NULL, the default amount is used.

rugW

numeric vector (recycled to length p) giving the width of the rug (i.e. the length of each tick mark). Set to zero to suppress the rug. Defaults to 0.12.

stickCol

color(s) of the vertical or horizontal "stick" drawn when side = "both", which separates the two half-bixplots sharing an axis. Set to NA to suppress the stick. Can be a single color or a vector recycled to length p. Defaults to "black".

stickLwd

single number giving the line width of the stick. Set to zero to suppress the stick. Defaults to 2.

boxwex

scale factor applied uniformly to the widths of the body, box and rug across all bixplots. [Argument from boxplot.] Defaults to 1.

width

optional numeric vector of length p giving the relative widths of the bixplots. Entries must be strictly positive; they are divided by their maximum so that the widest bixplot has relative width 1. The resulting ratios multiply the body, box and rug widths. If NULL (the default), all bixplots have equal width. [Argument from boxplot.]

side

character string specifying which side of the variable axis the body, box and rug are drawn on. One of "no" (the default; each bixplot is symmetric about its axis), "first" or "second" (all half-bixplots on that side), or "both" (adjacent variables are plotted on alternate sides of shared axes, halving the number of axes needed). [Argument from beanplot.]

tick

logical indicating whether tick marks are drawn on the group-label axis. Only has an effect when add = FALSE. Defaults to TRUE.

diplevel

significance level for Hartigan's dip test for unimodality. A cluster search is only performed for variables whose dip test p-value is at most diplevel. Defaults to 0.01. Increasing this value may yield more clusters; decreasing it fewer.

minN

minimum number of observations required per potential cluster. Defaults to 15. The maximum number of clusters searched is bounded by floor(n / minN). If n < 2 * minN, clustering is not attempted.

kmax

maximum number of clusters to consider. Internally capped at 5. If NULL (the default), it is set to min(floor(n / minN), 5). Setting kmax = 1 treats all variables as unimodal and the display resembles a violin plot.

bigN

when a variable has more than bigN non-missing values, a sample of bigN observations (always including the minimum and maximum) is drawn without replacement before computing densities and clusters, to reduce computation time. Defaults to 500; values below 300 are silently raised to 300.

clusMinN

minimum number of unique values that each cluster must contain. The constrained clustering enforces this bound. Defaults to 3.

bw

bandwidth for density, used to construct the density body. Can be a numeric value or a character string naming a bandwidth selector (see ?density). Defaults to "SJ-dpi".

kernel

kernel for density. Defaults to "gaussian".

cut

the density is computed from min(y) - cut * bw to max(y) + cut * bw where bw is the numeric bandwidth. Defaults to 3. [Argument from beanplot.]

cutmin

if finite, the density of every variable and mode is truncated to begin no lower than cutmin. Defaults to -Inf (no effect). [Argument from beanplot.]

cutmax

if finite, the density of every variable and mode is truncated to end no higher than cutmax. Defaults to Inf (no effect). [Argument from beanplot.]

xlim

numeric vector of length 2 giving the limits of the group axis (whether horizontal is TRUE or FALSE). If NULL, limits are set automatically. [Convention from boxplot.]

ylim

numeric vector of length 2 giving the limits of the numeric value axis (whether horizontal is TRUE or FALSE). If NULL, limits are set automatically. [Convention from boxplot.]

cex.axis

character expansion factor for axis tick labels.

main

title of the plot. Defaults to "bixplot".

cex.main

character expansion factor for the title.

line.main

margin line for the title (passed to title).

cex.colorbar

character expansion factor for the color bar axis labels when rugNumeric is used.

xlab

label for the horizontal axis. Defaults to "".

ylab

label for the vertical axis. Defaults to "".

cex.lab

character expansion factor for axis labels.

line.lab

margin line for axis labels. Defaults to 2.2.

xaxs

axis interval calculation style for the x-axis (see par). Defaults to "r".

yaxs

axis interval calculation style for the y-axis. Defaults to "r".

las

orientation of axis tick labels (see par). Defaults to 1 (always horizontal).

ann

logical indicating whether the plot should be annotated with xlab, ylab and main. Defaults to !add. [Argument from boxplot.]

plot

if TRUE (the default), the summary list is returned invisibly. If FALSE, it is also printed to the console. [Argument from boxplot.]

log

this argument from boxplot must be NULL. If a non-NULL value is supplied, a warning is issued and the argument is ignored, because a log transformation can change the number and position of modes. Apply the log transform explicitly to the data before calling bixplot if desired.

Details

For each variable, bixplot proceeds as follows. Hartigan's dip test is applied to test for unimodality; if the p-value exceeds diplevel, or if the sample is too small (n < 2 * minN), the variable is treated as unimodal (k = 1). Otherwise, constrained k-medoids clustering (via pamc1d) is fitted for k = 2, ..., kmax clusters, and the best k is selected by the highest mean silhouette width (computed via silhouette). If no k > 1 yields a positive mean silhouette width, the variable is treated as unimodal.

A single global bandwidth is computed once per variable and reused for the density of every mode, ensuring that density bodies are comparable across modes. Mode sizes are scaled according to bodysize.

When side = "both", adjacent variables are paired and plotted as half-bixplots on opposite sides of shared axes, and axis labels are combined automatically from the names of each pair.

The arguments rugNumeric and rugFactor require all variables to have the same number of observations. They cannot be specified simultaneously.

Value

A list returned invisibly when plot = TRUE (or visibly when plot = FALSE), containing:

call

the matched call.

p

the number of variables plotted.

<name_1>, <name_2>, ...

one list entry per variable, named after the variable. For a unimodal variable the list contains values (the sorted non-missing observations) and fivenumbersummary (the five-number summary from fivenum used to draw the box). For a multimodal variable the list additionally contains clustering (an integer vector of cluster assignments) and one sub-list per cluster named cluster_1, cluster_2, ..., each containing members and fivenumbersummary for that cluster.

Author(s)

P.J. Rousseeuw

References

Montalcini, C., Rousseeuw, P.J. (2025). The bixplot: A variation on the boxplot suited for bimodal data, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.48550/arXiv.2510.09276")} (open access).

See Also

pamc1d, pam, silhouette, dip.test, density, boxplot

Examples

set.seed(1)
# A unimodal and a clearly bimodal variable
x1 <- rnorm(100)
x2 <- c(rnorm(60, mean = -3), rnorm(60, mean = 3))
bixplot(x1, x2, names = c("unimodal", "bimodal"),
        main = "Basic bixplot example")

# Formula interface, coloring rug by a factor
n <- 150
grp <- factor(rep(c("A", "B", "C"), each = 50))
y   <- c(rnorm(50, 0), rnorm(50, 4), rnorm(50, 8))
bixplot(y ~ grp, main = "Formula interface")

# Horizontal layout with an external numeric rug variable
set.seed(42)
vals <- c(rnorm(80, mean = -2), rnorm(80, mean = 2))
covariate <- runif(160)
bixplot(vals, horizontal = TRUE,
        rugNumeric = covariate,
        rugNumericColors = c("purple", "yellow"),
        main = "Horizontal bixplot with numeric rug")

# Side-by-side ("both") half-bixplots
bixplot(x1, x2, side = "both",
        names = c("unimodal vs bimodal"),
        main = "side = both")

# For more examples, we refer to the vignette:
## Not run: 
vignette("bixplot_examples")

## End(Not run)

classmap documentation built on April 29, 2026, 5:10 p.m.