db_compute_boxplot: Returns a dataframe with boxplot calculations

View source: R/boxplot.R

db_compute_boxplotR Documentation

Returns a dataframe with boxplot calculations

Description

Uses dplyr operations to create boxplot calculations. Because of this approach, the calculations automatically run inside the database if 'data' has a database or sparklyr connection. The 'class()' of such tables in R are: tbl_sql, tbl_dbi, tbl_spark

Requires database support for percentile/quantile functions. Confirmed to work with:

  • DuckDB (recommended for local examples) - uses quantile()

  • Spark/Hive (via sparklyr) - uses percentile_approx()

  • SQL Server (2012+) - uses PERCENTILE_CONT()

  • PostgreSQL (9.4+) - uses percentile_cont()

  • Oracle (9i+) - uses PERCENTILE_CONT()

Does NOT work with SQLite, MySQL < 8.0, or MariaDB (no percentile support).

Note that this function supports input tbl that already contains grouping variables. This can be useful when creating faceted boxplots.

Usage

db_compute_boxplot(data, x, var, coef = 1.5)

Arguments

data

A table (tbl) that can already contain grouping variables

x

A discrete variable in which to group the boxplots

var

A continuous variable

coef

Length of the whiskers as multiple of IQR. Defaults to 1.5

Value

An ungrouped data.frame with boxplot statistics including columns for the grouping variable(s), quartiles (lower, middle, upper), whisker bounds (ymin, ymax), and the count of observations per group.

Examples

## Not run: 
library(DBI)
library(dplyr)
con <- dbConnect(duckdb::duckdb(), ":memory:")
db_mtcars <- copy_to(con, mtcars, "mtcars")

db_mtcars |>
  db_compute_boxplot(am, mpg)

dbDisconnect(con)

## End(Not run)

dbplot documentation built on March 13, 2026, 5:06 p.m.