hist: Modified Version of hist() with additional functionality

Description Usage Arguments Value References Examples

View source: R/hist.R

Description

The definition of histogram differs by source (with country-specific biases). R's default with equi-spaced breaks (also the default) is to plot the counts in the cells defined by breaks. Thus the height of a rectangle is proportional to the number of points falling into the cell, as is the area provided the breaks are equally-spaced. The modified hist() function included in the Bayezilla package however utilizes non-equal sized breaks. More details are given below. If right = TRUE (default), the histogram cells are intervals of the form (a, b], i.e., they include their right-hand endpoint, but not their left one, with the exception of the first cell when include.lowest is TRUE. For right = FALSE, the intervals are of the form [a, b), and include.lowest means ‘include highest’. A numerical tolerance of 1e-7 times the median bin size (for more than four bins, otherwise the median is substituted) is applied when counting entries on the edges of bins. This is not included in the reported breaks nor in the calculation of density.

Breaks Algorithms:

The default breaks algorithm is "dhist", which implements the varying binwidth algorithm of Lorraine Denby. This algorithm has some notable advantages from a statistical point of view. Regions of high density have not only taller bins (as is usual) but more narrow bins as well. Regions of lower denisty have not only shorter, but wider bins. This makes the probability density much more immediately obvious, and captures interesting features of heavy tails and skew with greater efficacy. Algorithms that yield The former oversmooths in regions of high density, and is poor at identifying sharp peaks and multimodality. By contrast, the latter variety oversmooths in regions of low density and can mask outliers and the heavy tails of more leptokurtotic distributions. For more information, see Denby & Mallows (2009). Other options include "scott", "sturges" and "fd" / "Freedman-Diaconis". Case is ignored and partial matching is used. Alternatively, a function can be supplied which will compute the intended number of breaks or the actual breakpoints as a function of x.
An example of output:


Usage

1
2
3
4
5
6
hist(x, breaks = "dhist", freq = FALSE, probability = !freq,
  include.lowest = TRUE, right = TRUE, density = NULL, angle = 45,
  col = "#00ff3cCC", border = NULL, main = NULL,
  xlim = range(breaks), ylim = NULL, xlab = xname, ylab,
  axes = TRUE, plot = TRUE, labels = FALSE, nclass = NULL,
  warn.unused = TRUE, rug = TRUE, rug.col = "#8d03e7", ...)

Arguments

x

a vector of values for which the histogram is desired.

breaks

one of: a vector giving the breakpoints between histogram cells, a function to compute the vector of breakpoints, a single number giving the number of cells for the histogram, or a character string naming an algorithm to compute the number of cells (see ‘Details’), a function to compute the number of cells. Defaults to "dhist".

freq

logical; if TRUE, the histogram graphic is a representation of frequencies, the counts component of the result; if FALSE, probability densities, component density, are plotted (so that the histogram has a total area of one). Defaults to FALSE.

probability

an alias for !freq, for S compatibility.

include.lowest

logical; if TRUE, an x[i] equal to the breaks value will be included in the first (or last, for right = FALSE) bar. This will be ignored (with a warning) unless breaks is a vector.

right

logical; if TRUE, the histogram cells are right-closed (left open) intervals.

density

the density of shading lines, in lines per inch. The default value of NULL means that no shading lines are drawn. Non-positive values of density also inhibit the drawing of shading lines.

angle

the slope of shading lines, given as an angle in degrees (counter-clockwise).

col

a colour to be used to fill the bars. The default is "#00ff3cCC".

border

the color of the border around the bars. The default is to use the standard foreground color.

main

these arguments to title have useful defaults here.

xlim

the range of x and y values with sensible defaults. Note that xlim is not used to define the histogram (breaks), but only for plotting (when plot = TRUE).

ylim

the range of x and y values with sensible defaults. Note that xlim is not used to define the histogram (breaks), but only for plotting (when plot = TRUE).

xlab

these arguments to title have useful defaults here.

ylab

these arguments to title have useful defaults here.

axes

logical. If TRUE (default), axes are draw if the plot is drawn.

plot

logical. If TRUE (default), a histogram is plotted, otherwise a list of breaks and counts is returned. In the latter case, a warning is used if (typically graphical) arguments are specified that only apply to the plot = TRUE case.

labels

logical or character string. Additionally draw labels on top of bars, if not FALSE; see plot.histogram.

nclass

numeric (integer). For S(-PLUS) compatibility only, nclass is equivalent to breaks for a scalar or character argument.

warn.unused

logical. If plot = FALSE and warn.unused = TRUE, a warning will be issued when graphical parameters are passed to hist.default().

rug

Should a rug be plotted under the histogram? Defaults to TRUE.

rug.col

The rug color. Defaults to #8d03e7.

...

further arguments and graphical parameters passed to plot.histogram and thence to title and axis (if plot = TRUE).

Value

a histogram

References

Denby, L., & Mallows, C. (2009). Variations on the Histogram. Journal of Computational and Graphical Statistics, 18(1), 21–31. doi:10.1198/jcgs.2009.0002

Examples

1
2
3
4
5
6
7
8
9
x <- rgamma(1000, 2, .25)
par(mfrow = c(2, 2))
hist(x, breaks = "dhist")
hist(x, breaks = "fd")
par(mfrow = c(2, 2))
hist(x, breaks = "dhist", main = "'dhist' breaks method")
hist(x, breaks = "fd", main = "'Freedman-Diaconis' breaks method")
hist(x, breaks = "scott", main = "'Scott' breaks method")
hist(x, breaks = "sturges", main = "'Sturges' breaks method")

abnormally-distributed/Bayezilla documentation built on Oct. 31, 2019, 1:57 a.m.