histogram: Histograms and Kernel Density Plots In lattice: Trellis Graphics for R

Description

Draw Histograms and Kernel Density Plots, possibly conditioned on other variables.

Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58``` ```histogram(x, data, ...) densityplot(x, data, ...) ## S3 method for class 'formula' histogram(x, data, allow.multiple, outer = TRUE, auto.key = FALSE, aspect = "fill", panel = lattice.getOption("panel.histogram"), prepanel, scales, strip, groups, xlab, xlim, ylab, ylim, type = c("percent", "count", "density"), nint = if (is.factor(x)) nlevels(x) else round(log2(length(x)) + 1), endpoints = extend.limits(range(as.numeric(x), finite = TRUE), prop = 0.04), breaks, equal.widths = TRUE, drop.unused.levels = lattice.getOption("drop.unused.levels"), ..., lattice.options = NULL, default.scales = list(), default.prepanel = lattice.getOption("prepanel.default.histogram"), subscripts, subset) ## S3 method for class 'numeric' histogram(x, data = NULL, xlab, ...) ## S3 method for class 'factor' histogram(x, data = NULL, xlab, ...) ## S3 method for class 'formula' densityplot(x, data, allow.multiple = is.null(groups) || outer, outer = !is.null(groups), auto.key = FALSE, aspect = "fill", panel = lattice.getOption("panel.densityplot"), prepanel, scales, strip, groups, weights, xlab, xlim, ylab, ylim, bw, adjust, kernel, window, width, give.Rkern, n = 512, from, to, cut, na.rm, drop.unused.levels = lattice.getOption("drop.unused.levels"), ..., lattice.options = NULL, default.scales = list(), default.prepanel = lattice.getOption("prepanel.default.densityplot"), subscripts, subset) ## S3 method for class 'numeric' densityplot(x, data = NULL, xlab, ...) do.breaks(endpoints, nint) ```

Arguments

 `x` The object on which method dispatch is carried out. For the `formula` method, `x` can be a formula of the form `~ x | g1 * g2 * ...`, indicating that histograms or kernel density estimates of the `x` variable should be produced conditioned on the levels of the (optional) variables `g1`, `g2`, .... `x` should be numeric (or possibly a factor in the case of `histogram`), and each of `g1`, `g2`, ... should be either factors or shingles. As a special case, the right hand side of the formula can contain more than one term separated by ‘+’ signs (e.g., ```~ x1 + x2 | g1 * g2```). What happens in this case is described in the documentation for `xyplot`. Note that in either form, all the terms in the formula must have the same length after evaluation. For the `numeric` and `factor` methods, `x` is the variable whose histogram or Kernel density estimate is drawn. Conditioning is not allowed in these cases. `data` For the `formula` method, an optional data source (usually a data frame) in which variables are to be evaluated (see `xyplot` for details). `data` should not be specified for the other methods, and is ignored with a warning if it is. `type` A character string indicating the type of histogram that is to be drawn. `"percent"` and `"count"` give relative frequency and frequency histograms respectively, and can be misleading when breakpoints are not equally spaced. `"density"` produces a density histogram. `type` defaults to `"density"` when the breakpoints are unequally spaced, and when `breaks` is `NULL` or a function, and to `"percent"` otherwise. `nint` An integer specifying the number of histogram bins, applicable only when `breaks` is unspecified or `NULL` in the call. Ignored when the variable being plotted is a factor. `endpoints` A numeric vector of length 2 indicating the range of x-values that is to be covered by the histogram. This applies only when `breaks` is unspecified and the variable being plotted is not a factor. In `do.breaks`, this specifies the interval that is to be divided up. `breaks` Usually a numeric vector of length (number of bins + 1) defining the breakpoints of the bins. Note that when breakpoints are not equally spaced, the only value of `type` that makes sense is density. When `breaks` is unspecified, the value of `lattice.getOption("histogram.breaks")` is first checked. If this value is `NULL`, then the default is to use ``` breaks = seq_len(1 + nlevels(x)) - 0.5 ``` when `x` is a factor, and ``` breaks = do.breaks(endpoints, nint) ``` otherwise. Breakpoints calculated in such a manner are used in all panels. If the retrieved value is not `NULL`, or if `breaks` is explicitly specified, it affects the display in each panel independently. Valid values are those accepted as the `breaks` argument in `hist`. In particular, this allows specification of `breaks` as an integer giving the number of bins (similar to `nint`), as a character string denoting a method, or as a function. When specified explicitly, a special value of `breaks` is `NULL`, in which case the number of bins is determined by `nint` and then breakpoints are chosen according to the value of `equal.widths`. `equal.widths` A logical flag, relevant only when `breaks=NULL`. If `TRUE`, equally spaced bins will be selected, otherwise, approximately equal area bins will be selected (typically producing unequally spaced breakpoints). `n` Integer, giving the number of points at which the kernel density is to be evaluated. Passed on as an argument to `density`. `panel` A function, called once for each panel, that uses the packet (subset of panel variables) corresponding to the panel to create a display. The default panel functions `panel.histogram` and `panel.densityplot` are documented separately, and have arguments that can be used to customize its output in various ways. Such arguments can usually be directly supplied to the high-level function. `allow.multiple, outer` See `xyplot`. `auto.key` See `xyplot`. `aspect` See `xyplot`. `prepanel` See `xyplot`. `scales` See `xyplot`. `strip` See `xyplot`. `groups` See `xyplot`. Note that the default panel function for `histogram` does not support grouped displays, whereas the one for `densityplot` does. `xlab, ylab` See `xyplot`. `xlim, ylim` See `xyplot`. `drop.unused.levels` See `xyplot`. `lattice.options` See `xyplot`. `default.scales` See `xyplot`. `subscripts` See `xyplot`. `subset` See `xyplot`. `default.prepanel` Fallback prepanel function. See `xyplot`. `weights` numeric vector of weights for the density calculations, evaluated in the non-standard manner used for `groups` and terms in the formula, if any. If this is specified, it is subsetted using `subscripts` inside the panel function to match it to the corresponding `x` values. At the time of writing, `weights` do not work in conjunction with an extended formula specification (this is not too hard to fix, so just bug the maintainer if you need this feature). `bw, adjust, width` Arguments controlling bandwidth. Passed on as arguments to `density`. `kernel, window` The choice of kernel. Passed on as arguments to `density`. `give.Rkern` Logical flag, passed on as argument to `density`. This argument is made available only for ease of implementation, and will produce an error if `TRUE`. `from, to, cut` Controls range over which density is evaluated. Passed on as arguments to `density`. `na.rm` Logical flag specifying whether `NA` values should be ignored. Passed on as argument to `density`, but unlike in `density`, the default is `TRUE`. `...` Further arguments. See corresponding entry in `xyplot` for non-trivial details.

Details

`histogram` draws Conditional Histograms, and `densityplot` draws Conditional Kernel Density Plots. The default panel function uses the `density` function to compute the density estimate, and all arguments accepted by `density` can be specified in the call to `densityplot` to control the output. See documentation of `density` for details.

These and all other high level Trellis functions have several arguments in common. These are extensively documented only in the help page for `xyplot`, which should be consulted to learn more detailed usage.

`do.breaks` is an utility function that calculates breakpoints given an interval and the number of pieces to break it into.

Value

An object of class `"trellis"`. The `update` method can be used to update components of the object and the `print` method (usually called by default) will plot it on an appropriate plotting device.

Note

The form of the arguments accepted by the default panel function `panel.histogram` is different from that in S-PLUS. Whereas S-PLUS calculates the heights inside `histogram` and passes only the breakpoints and the heights to the panel function, lattice simply passes along the original variable `x` along with the breakpoints. This approach is more flexible; see the example below with an estimated density superimposed over the histogram.

Author(s)

Deepayan Sarkar Deepayan.Sarkar@R-project.org

References

Sarkar, Deepayan (2008) Lattice: Multivariate Data Visualization with R, Springer. http://lmdvr.r-forge.r-project.org/

`xyplot`, `panel.histogram`, `density`, `panel.densityplot`, `panel.mathdensity`, `Lattice`

Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15``` ```require(stats) histogram( ~ height | voice.part, data = singer, nint = 17, endpoints = c(59.5, 76.5), layout = c(2,4), aspect = 1, xlab = "Height (inches)") histogram( ~ height | voice.part, data = singer, xlab = "Height (inches)", type = "density", panel = function(x, ...) { panel.histogram(x, ...) panel.mathdensity(dmath = dnorm, col = "black", args = list(mean=mean(x),sd=sd(x))) } ) densityplot( ~ height | voice.part, data = singer, layout = c(2, 4), xlab = "Height (inches)", bw = 5) ```

Example output

```
```

lattice documentation built on Sept. 22, 2021, 5:11 p.m.