Plot a Histogram

Description

Plots a histogram for a data set, the user has options for defining the axis and main titles, the x-axis limits, arithmetic or logarithmic x-axis scaling, the method for calculating the number of bins the data are displayed in, and the colour of the infill.

Usage

1
2
3
4
gx.hist(xx, xlab = deparse(substitute(xx)), 
	ylab = "Number of Observations", log = FALSE, xlim = NULL, 
	main = "", nclass = NULL, colr = NULL, ifnright = TRUE,
	cex = 0.8, ...)

Arguments

xx

name of the variable to be plotted

xlab

by default the character string for xx is used for the x-axis title. An alternate title can be displayed with xlab = "text string", see Examples.

ylab

a default y-axis title of "Number of Observations" is provided, this may be changed, e.g., ylab = "Counts".

log

to display the data with logarithmic (x-axis) scaling, set log = TRUE.

xlim

default limits of the x-axis are determined in the function for use in other panel plots of function shape. However, when used stand-alone the limits may be user-defined by setting xlim, see Note below.

main

when used stand-alone a title may be added optionally above the plot by setting main, e.g., main = "Kola Project, 1995".

nclass

the default procedure for preparing the histogram depends on sample size. Where N <= 500 the Scott (1979) rule is used, and when N > 500 the Freedman-Diaconis (1981) rule; both these rules are resistant to the presence of outliers, and usually provide informative histograms. Alternately, the user may define the histogram binning by setting nclass, i.e. nclass = "scott", nclass = "fd" or nclass = "sturges"; the latter being designed for normal distributions (Scott, 1992). See Venables and Ripley (2001) for details.

colr

by default the histogram is infilled in grey, colr = 8. If no infill is required, set colr = 0. See function display.lty for the range of available colours.

ifnright

controls where the sample size is plotted in the histogram display, by default this in the upper right corner of the plot. If the data distribution is such that the upper left corner would be preferable, set ifnright = FALSE. If neither option generates an acceptable plot, setting ifnright = NULL suppresses the display of the data set size.

cex

by default the size of the text for data set size, N, is set to 80%, i.e. cex = 0.8, and may be changed if required.

...

further arguments to be passed to methods. For example, the size of the axis titles may be changed by setting cex.lab, the size of the axis labels by setting cex.axis, and the size of the plot title by setting cex.main. For example, if it is required to make the plot title smaller, add cex.main = 0.9 to reduce the font size by 10%.

Value

xlim

A two element vector containing the actual minimum [1] and maximum [2] x-axis limits used in the histogram display are returned. These are used in function shape to ensure all panels have the same x-axis limits.

Note

Any less than detection limit values represented by negative values, or zeros or other numeric codes representing blanks in the data, must be removed prior to executing this function, see ltdl.fix.df.

Any NAs in the data vector are removed prior to displaying the plots.

If the default selection for xlim is inappropriate it can be set, e.g., xlim = c(0, 200) or c(2, 200), the latter being appropriate for a logarithmically scaled plot, i.e. log = TRUE. If the defined limits lie within the observed data range a truncated plot will be displayed. If this occurs the number of data points omitted is displayed below the total number of observations.

If it is desired to prepare a display of data falling within a defined part of the actual data range, then either a data subset can be prepared externally using the appropriate R syntax, or xx may be defined in the function call as, for example, Cu[Cu < some.value] which would remove the influence of one or more outliers having values greater than some.value. In this case the number of data values displayed will be the number that are <some.value.

Author(s)

Robert G. Garrett

References

Venables, W.N. and Ripley, B.D., 2001. Modern Applied Statistics with S-Plus, 3rd Edition, Springer, 501 p. See pp. 119 for a description of histogram bin selection computations.

See Also

display.lty, ltdl.fix.df, remove.na

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Make test data available
data(kola.o) 
attach(kola.o)

## Generates an initial display to have a first look at the data and
## decide how best to proceed
gx.hist(Cu)

## Provides a more appropriate initial display
gx.hist(Cu, xlab = "Cu (mg/kg) in <2 mm Kola O-horizon soil", log = TRUE)

## Causes the Sturges rule to be used to select the number 
## of histogram bins
gx.hist(Cu, xlab = "Cu (mg/kg) in <2 mm Kola O-horizon soil", log = TRUE, 
	nclass = "sturges")

## Detach test data
detach(kola.o)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.