layer_stats: Layer statistical transformations

layer_statsR Documentation

Layer statistical transformations

Description

In ggplot2, a plot is constructed by adding layers to it. A layer consists of two important parts: the geometry (geoms), and statistical transformations (stats). The 'stat' part of a layer is important because it performs a computation on the data before it is displayed. Stats determine what is displayed, not how it is displayed.

For example, if you add stat_density() to a plot, a kernel density estimation is performed, which can be displayed with the 'geom' part of a layer. For many ⁠geom_*()⁠ functions, stat_identity() is used, which performs no extra computation on the data.

Specifying stats

There are five ways in which the 'stat' part of a layer can be specified.

# 1. The stat can have a layer constructor
stat_density()

# 2. A geom can default to a particular stat
geom_density() # has `stat = "density"` as default

# 3. It can be given to a geom as a string
geom_line(stat = "density")

# 4. The ggproto object of a stat can be given
geom_area(stat = StatDensity)

# 5. It can be given to `layer()` directly:
layer(
  geom = "line",
  stat = "density",
  position = "identity"
)

Many of these ways are absolutely equivalent. Using stat_density(geom = "line") is identical to using geom_line(stat = "density"). Note that for layer(), you need to provide the "position" argument as well. To give stats as a string, take the function name, and remove the stat_ prefix, such that stat_bin becomes "bin".

Some of the more well known stats that can be used for the stat argument are: "density", "bin", "count", "function" and "smooth".

Paired geoms and stats

Some geoms have paired stats. In some cases, like geom_density(), it is just a variant of another geom, geom_area(), with slightly different defaults.

In other cases, the relationship is more complex. In the case of boxplots for example, the stat and the geom have distinct roles. The role of the stat is to compute the five-number summary of the data. In addition to just displaying the box of the five-number summary, the geom also provides display options for the outliers and widths of boxplots. In such cases, you cannot freely exchange geoms and stats: using stat_boxplot(geom = "line") or geom_area(stat = "boxplot") give errors.

Some stats and geoms that are paired are:

  • geom_violin() and stat_ydensity()

  • geom_histogram() and stat_bin()

  • geom_contour() and stat_contour()

  • geom_function() and stat_function()

  • geom_bin_2d() and stat_bin_2d()

  • geom_boxplot() and stat_boxplot()

  • geom_count() and stat_sum()

  • geom_density() and stat_density()

  • geom_density_2d() and stat_density_2d()

  • geom_hex() and stat_binhex()

  • geom_quantile() and stat_quantile()

  • geom_smooth() and stat_smooth()

Using computed variables

As mentioned above, the role of stats is to perform computation on the data. As a result, stats have 'computed variables' that determine compatibility with geoms. These computed variables are documented in the Computed variables sections of the documentation, for example in ?stat_bin. While more thoroughly documented in after_stat(), it should briefly be mentioned that these computed stats can be accessed in aes().

For example, the ?stat_density documentation states that, in addition to a variable called density, the stat computes a variable named count. Instead of scaling such that the area integrates to 1, the count variable scales the computed density such that the values can be interpreted as counts. If stat_density(aes(y = after_stat(count))) is used, we can display these count-scaled densities instead of the regular densities.

The computed variables offer flexibility in that arbitrary geom-stat pairings can be made. While not necessarily recommended, geom_line() can be paired with stat = "boxplot" if the line is instructed on how to use the boxplot computed variables:

ggplot(mpg, aes(factor(cyl))) +
  geom_line(
    # Stage gives 'displ' to the stat, and afterwards chooses 'middle' as
    # the y-variable to display
    aes(y = stage(displ, after_stat = middle),
        # Regroup after computing the stats to display a single line
        group = after_stat(1)),
    stat = "boxplot"
  )

Under the hood

Internally, stats are represented as ggproto classes that occupy a slot in a layer. All these classes inherit from the parental Stat ggproto object that orchestrates how stats work. Briefly, stats are given the opportunity to perform computation either on the layer as a whole, a facet panel, or on individual groups. For more information on extending stats, see the Creating a new stat section after running vignette("extending-ggplot2"). Additionally, see the New stats section of the online book.

See Also

For an overview of all stat layers, see the online reference.

How computed aesthetics work.

Other layer documentation: layer(), layer_geoms, layer_positions


ggplot2 documentation built on June 22, 2024, 11:35 a.m.