stat_summary
operates on unique x
; stat_summary_bin
operators on binned x
. They are more flexible versions of
stat_bin
: instead of just counting, they can compute any
aggregate.
1 2 3 4 5 6 7 8 9  stat_summary_bin(mapping = NULL, data = NULL, geom = "pointrange",
position = "identity", ..., fun.data = NULL, fun.y = NULL,
fun.ymax = NULL, fun.ymin = NULL, fun.args = list(), na.rm = FALSE,
show.legend = NA, inherit.aes = TRUE)
stat_summary(mapping = NULL, data = NULL, geom = "pointrange",
position = "identity", ..., fun.data = NULL, fun.y = NULL,
fun.ymax = NULL, fun.ymin = NULL, fun.args = list(), na.rm = FALSE,
show.legend = NA, inherit.aes = TRUE)

mapping 
Set of aesthetic mappings created by 
data 
The data to be displayed in this layer. There are three options: If A A 
geom 
Use to override the default connection between

position 
Position adjustment, either as a string, or the result of a call to a position adjustment function. 
... 
other arguments passed on to 
fun.data 
A function that is given the complete data and should
return a data frame with variables 
fun.ymin, fun.y, fun.ymax 
Alternatively, supply three individual functions that are each passed a vector of x's and should return a single number. 
fun.args 
Optional additional arguments passed on to the functions. 
na.rm 
If 
show.legend 
logical. Should this layer be included in the legends?

inherit.aes 
If 
statsummary
You can either supply summary functions individually (fun.y
,
fun.ymax
, fun.ymin
), or as a single function (fun.data
):
Complete summary function. Should take numeric vector as input and return data frame as output
ymin summary function (should take numeric vector and return single number)
y summary function (should take numeric vector and return single number)
ymax summary function (should take numeric vector and return single number)
A simple vector function is easiest to work with as you can return a single
number, but is somewhat less flexible. If your summary function computes
multiple values at once (e.g. ymin and ymax), use fun.data
.
If no aggregation functions are suppled, will default to
mean_se
.
geom_errorbar
, geom_pointrange
,
geom_linerange
, geom_crossbar
for geoms to
display summarised data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60  d < ggplot(mtcars, aes(cyl, mpg)) + geom_point()
d + stat_summary(fun.data = "mean_cl_boot", colour = "red", size = 2)
# You can supply individual functions to summarise the value at
# each x:
d + stat_summary(fun.y = "median", colour = "red", size = 2, geom = "point")
d + stat_summary(fun.y = "mean", colour = "red", size = 2, geom = "point")
d + aes(colour = factor(vs)) + stat_summary(fun.y = mean, geom="line")
d + stat_summary(fun.y = mean, fun.ymin = min, fun.ymax = max,
colour = "red")
d < ggplot(diamonds, aes(cut))
d + geom_bar()
d + stat_summary_bin(aes(y = price), fun.y = "mean", geom = "bar")
# Don't use ylim to zoom into a summary plot  this throws the
# data away
p < ggplot(mtcars, aes(cyl, mpg)) +
stat_summary(fun.y = "mean", geom = "point")
p
p + ylim(15, 30)
# Instead use coord_cartesian
p + coord_cartesian(ylim = c(15, 30))
# A set of useful summary functions is provided from the Hmisc package:
stat_sum_df < function(fun, geom="crossbar", ...) {
stat_summary(fun.data = fun, colour = "red", geom = geom, width = 0.2, ...)
}
d < ggplot(mtcars, aes(cyl, mpg)) + geom_point()
# The crossbar geom needs grouping to be specified when used with
# a continuous x axis.
d + stat_sum_df("mean_cl_boot", mapping = aes(group = cyl))
d + stat_sum_df("mean_sdl", mapping = aes(group = cyl))
d + stat_sum_df("mean_sdl", fun.args = list(mult = 1), mapping = aes(group = cyl))
d + stat_sum_df("median_hilow", mapping = aes(group = cyl))
# An example with highly skewed distributions:
if (require("ggplot2movies")) {
set.seed(596)
mov < movies[sample(nrow(movies), 1000), ]
m2 < ggplot(mov, aes(x = factor(round(rating)), y = votes)) + geom_point()
m2 < m2 + stat_summary(fun.data = "mean_cl_boot", geom = "crossbar",
colour = "red", width = 0.3) + xlab("rating")
m2
# Notice how the overplotting skews off visual perception of the mean
# supplementing the raw data with summary statistics is _very_ important
# Next, we'll look at votes on a log scale.
# Transforming the scale means the data are transformed
# first, after which statistics are computed:
m2 + scale_y_log10()
# Transforming the coordinate system occurs after the
# statistic has been computed. This means we're calculating the summary on the raw data
# and stretching the geoms onto the log scale. Compare the widths of the
# standard errors.
m2 + coord_trans(y="log10")
}

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.