Description Usage Arguments Aesthetics Summary functions See Also Examples
stat_summary
operates on unique x
; stat_summary_bin
operates on binned x
. They are more flexible versions of
stat_bin()
: instead of just counting, they can compute any
aggregate.
1 2 3 4 5 6 7 8 9 10 | stat_summary_bin(mapping = NULL, data = NULL, geom = "pointrange",
position = "identity", ..., fun.data = NULL, fun.y = NULL,
fun.ymax = NULL, fun.ymin = NULL, fun.args = list(), bins = 30,
binwidth = NULL, breaks = NULL, na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE)
stat_summary(mapping = NULL, data = NULL, geom = "pointrange",
position = "identity", ..., fun.data = NULL, fun.y = NULL,
fun.ymax = NULL, fun.ymin = NULL, fun.args = list(),
na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
|
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
geom |
Use to override the default connection between
|
position |
Position adjustment, either as a string, or the result of a call to a position adjustment function. |
... |
Other arguments passed on to |
fun.data |
A function that is given the complete data and should
return a data frame with variables |
fun.ymin, fun.y, fun.ymax |
Alternatively, supply three individual functions that are each passed a vector of x's and should return a single number. |
fun.args |
Optional additional arguments passed on to the functions. |
bins |
Number of bins. Overridden by |
binwidth |
The width of the bins. Can be specified as a numeric value,
or a function that calculates width from x.
The default is to use The bin width of a date variable is the number of days in each time; the bin width of a time variable is the number of seconds. |
breaks |
Alternatively, you can supply a numeric vector giving
the bin boundaries. Overrides |
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
stat_summary()
understands the following aesthetics (required aesthetics are in bold):
x
y
group
Learn more about setting these aesthetics in vignette("ggplot2-specs")
.
You can either supply summary functions individually (fun.y
,
fun.ymax
, fun.ymin
), or as a single function (fun.data
):
Complete summary function. Should take numeric vector as input and return data frame as output
ymin summary function (should take numeric vector and return single number)
y summary function (should take numeric vector and return single number)
ymax summary function (should take numeric vector and return single number)
A simple vector function is easiest to work with as you can return a single
number, but is somewhat less flexible. If your summary function computes
multiple values at once (e.g. ymin and ymax), use fun.data
.
If no aggregation functions are supplied, will default to
mean_se()
.
geom_errorbar()
, geom_pointrange()
,
geom_linerange()
, geom_crossbar()
for geoms to
display summarised data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | d <- ggplot(mtcars, aes(cyl, mpg)) + geom_point()
d + stat_summary(fun.data = "mean_cl_boot", colour = "red", size = 2)
# You can supply individual functions to summarise the value at
# each x:
d + stat_summary(fun.y = "median", colour = "red", size = 2, geom = "point")
d + stat_summary(fun.y = "mean", colour = "red", size = 2, geom = "point")
d + aes(colour = factor(vs)) + stat_summary(fun.y = mean, geom="line")
d + stat_summary(fun.y = mean, fun.ymin = min, fun.ymax = max,
colour = "red")
d <- ggplot(diamonds, aes(cut))
d + geom_bar()
d + stat_summary_bin(aes(y = price), fun.y = "mean", geom = "bar")
# Don't use ylim to zoom into a summary plot - this throws the
# data away
p <- ggplot(mtcars, aes(cyl, mpg)) +
stat_summary(fun.y = "mean", geom = "point")
p
p + ylim(15, 30)
# Instead use coord_cartesian
p + coord_cartesian(ylim = c(15, 30))
# A set of useful summary functions is provided from the Hmisc package:
stat_sum_df <- function(fun, geom="crossbar", ...) {
stat_summary(fun.data = fun, colour = "red", geom = geom, width = 0.2, ...)
}
d <- ggplot(mtcars, aes(cyl, mpg)) + geom_point()
# The crossbar geom needs grouping to be specified when used with
# a continuous x axis.
d + stat_sum_df("mean_cl_boot", mapping = aes(group = cyl))
d + stat_sum_df("mean_sdl", mapping = aes(group = cyl))
d + stat_sum_df("mean_sdl", fun.args = list(mult = 1), mapping = aes(group = cyl))
d + stat_sum_df("median_hilow", mapping = aes(group = cyl))
# An example with highly skewed distributions:
if (require("ggplot2movies")) {
set.seed(596)
mov <- movies[sample(nrow(movies), 1000), ]
m2 <- ggplot(mov, aes(x = factor(round(rating)), y = votes)) + geom_point()
m2 <- m2 + stat_summary(fun.data = "mean_cl_boot", geom = "crossbar",
colour = "red", width = 0.3) + xlab("rating")
m2
# Notice how the overplotting skews off visual perception of the mean
# supplementing the raw data with summary statistics is _very_ important
# Next, we'll look at votes on a log scale.
# Transforming the scale means the data are transformed
# first, after which statistics are computed:
m2 + scale_y_log10()
# Transforming the coordinate system occurs after the
# statistic has been computed. This means we're calculating the summary on the raw data
# and stretching the geoms onto the log scale. Compare the widths of the
# standard errors.
m2 + coord_trans(y="log10")
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.