wtd.hist: Weighted Histograms

Description Usage Arguments Details Value Author(s) Examples

Description

Produces weighted histograms by adding a "weight" option to the his.default function from the graphics package (Copyright R-core). The code here was copied from that function and modified slightly to allow for weighted histograms as well as unweighted histograms. The generic function hist computes a histogram of the given data values. If plot=TRUE, the resulting object of class "histogram" is plotted by plot.histogram, before it is returned.

Usage

1
2
3
4
5
6
7
8
9
wtd.hist(x, breaks = "Sturges",
     freq = NULL, probability = !freq,
     include.lowest = TRUE, right = TRUE,
     density = NULL, angle = 45, col = NULL, border = NULL,
     main = paste("Histogram of" , xname),
     xlim = range(breaks), ylim = NULL,
     xlab = xname, ylab,
     axes = TRUE, plot = TRUE, labels = FALSE,
     nclass = NULL, weight = NULL, ...)

Arguments

x

a vector of values for which the histogram is desired.

breaks

one of:

  • a vector giving the breakpoints between histogram cells,

  • a single number giving the number of cells for the histogram,

  • a character string naming an algorithm to compute the number of cells (see ‘Details’),

  • a function to compute the number of cells.

In the last three cases the number is a suggestion only.

freq

logical; if TRUE, the histogram graphic is a representation of frequencies, the counts component of the result; if FALSE, probability densities, component density, are plotted (so that the histogram has a total area of one). Defaults to TRUE if and only if breaks are equidistant (and probability is not specified).

probability

an alias for !freq, for S compatibility.

include.lowest

logical; if TRUE, an x[i] equal to the breaks value will be included in the first (or last, for right = FALSE) bar. This will be ignored (with a warning) unless breaks is a vector.

right

logical; if TRUE, the histogram cells are right-closed (left open) intervals.

density

the density of shading lines, in lines per inch. The default value of NULL means that no shading lines are drawn. Non-positive values of density also inhibit the drawing of shading lines.

angle

the slope of shading lines, given as an angle in degrees (counter-clockwise).

col

a colour to be used to fill the bars. The default of NULL yields unfilled bars.

border

the color of the border around the bars. The default is to use the standard foreground color.

main, xlab, ylab

these arguments to title have useful defaults here.

xlim, ylim

the range of x and y values with sensible defaults. Note that xlim is not used to define the histogram (breaks), but only for plotting (when plot = TRUE).

axes

logical. If TRUE (default), axes are draw if the plot is drawn.

plot

logical. If TRUE (default), a histogram is plotted, otherwise a list of breaks and counts is returned. In the latter case, a warning is used if (typically graphical) arguments are specified that only apply to the plot = TRUE case.

labels

logical or character. Additionally draw labels on top of bars, if not FALSE; see plot.histogram in the graphics package.

nclass

numeric (integer). For S(-PLUS) compatibility only, nclass is equivalent to breaks for a scalar or character argument.

weight

numeric. Defines a set of weights to produce a weighted histogram. Will default to 1 for each case if no other weight is defined.

...

further arguments and graphical parameters passed to plot.histogram and thence to title and axis (if plot=TRUE).

Details

The definition of histogram differs by source (with country-specific biases). R's default with equi-spaced breaks (also the default) is to plot the (weighted) counts in the cells defined by breaks. Thus the height of a rectangle is proportional to the (weighted) number of points falling into the cell, as is the area provided the breaks are equally-spaced.

The default with non-equi-spaced breaks is to give a plot of area one, in which the area of the rectangles is the fraction of the data points falling in the cells.

If right = TRUE (default), the histogram cells are intervals of the form (a, b], i.e., they include their right-hand endpoint, but not their left one, with the exception of the first cell when include.lowest is TRUE.

For right = FALSE, the intervals are of the form [a, b), and include.lowest means ‘include highest’.

The default for breaks is "Sturges": see nclass.Sturges. Other names for which algorithms are supplied are "Scott" and "FD" / "Freedman-Diaconis" (with corresponding functions nclass.scott and nclass.FD). Case is ignored and partial matching is used. Alternatively, a function can be supplied which will compute the intended number of breaks as a function of x.

Value

an object of class "histogram" which is a list with components:

breaks

the n+1 cell boundaries (= breaks if that was a vector). These are the nominal breaks, not with the boundary fuzz.

counts

n values; for each cell, the number of x[] inside.

density

values for each bin such that the area under the histogram totals 1. \hat f(x_i ω_i) / f^(x[i] ω[i]), as estimated density values. If all(diff(breaks) == 1), they are the relative frequencies counts/n and in general satisfy ∑_i \hat f(x_i ω_i) (b_{i+1}-b_i) = 1 / sum[i; f^(x[i] ω[i]) (b[i+1]-b[i])] = 1, where b[i] = breaks[i].

intensities

same as density. Deprecated, but retained for compatibility.

mids

the n cell midpoints.

xname

a character string with the actual x argument name.

equidist

logical, indicating if the distances between breaks are all the same.

Author(s)

Josh Pasek, Assistant Professor of Communication Studies at the University of Michigan (www.joshpasek.com) was responsible for the updates to the hist function necessary to implement weighted counts. The hist.default code from the graphics package on which the current function was based was written by R-core. All modifications are noted in code and the copyright for all original code remains with R-core.

Examples

1
2
3
4
5
6
var1 <- c(1:100)
wgt <- var1/mean(var1)
par(mfrow=c(2, 2))
wtd.hist(var1)
wtd.hist(var1, weight=wgt)
wtd.hist(var1, weight=var1)

Example output

Loading required package: Hmisc
Loading required package: lattice
Loading required package: survival
Loading required package: Formula
Loading required package: ggplot2

Attaching package: 'Hmisc'

The following objects are masked from 'package:base':

    format.pval, round.POSIXt, trunc.POSIXt, units

Loading required package: gdata
sh: 1: cannot create /dev/null: Permission denied
gdata: Unable to locate valid perl interpreter
gdata: 
gdata: read.xls() will be unable to read Excel XLS and XLSX files
gdata: unless the 'perl=' argument is used to specify the location of a
gdata: valid perl intrpreter.
gdata: 
gdata: (To avoid display of this message in the future, please ensure
gdata: perl is installed and available on the executable search path.)
sh: 1: cannot create /dev/null: Permission denied
gdata: Unable to load perl libaries needed by read.xls()
gdata: to support 'XLX' (Excel 97-2004) files.

gdata: Unable to load perl libaries needed by read.xls()
gdata: to support 'XLSX' (Excel 2007+) files.

gdata: Run the function 'installXLSXsupport()'
gdata: to automatically download and install the perl
gdata: libaries needed to support Excel XLS and XLSX formats.

Attaching package: 'gdata'

The following object is masked from 'package:Hmisc':

    combine

The following object is masked from 'package:stats':

    nobs

The following object is masked from 'package:utils':

    object.size

The following object is masked from 'package:base':

    startsWith

Loading required package: mice

weights documentation built on June 11, 2021, 1:06 a.m.

Related to wtd.hist in weights...