bin: Bin in 1d

Description Usage Arguments Floating point Examples

Description

Bin a numeric vector and count how many observations fall in each bin. Supports weights so that you can re-bin pre-binned data.

Usage

1
2
3
4
5
6
7
bin_fixed(x, width = NULL, center = NULL, boundary = NULL,
  origin = NULL, terminus = NULL, bins = 30, pad = FALSE,
  closed = c("right", "left"))

bin_breaks(breaks, closed = c("right", "left"))

bin_date(x, bins = 30, closed = c("right", "left"))

Arguments

x

A numeric vector to guess parameters from.

width

(Positive real). The width of a bin. For S3 objects, the interpretation of width depends on the interpretation of the underlying numeric vector. For example, for dates, 1 = 1 day; for times 1 = 1 second; and for difftime, the units vary.

If NULL, the width will be derived from the data, picking approximately bins bins with nice widths. You should always override this value, exploring multiple widths to find the best to illustrate the stories in your data.

boundary, center

Set the position of the first bin by specifying the position of either a boundary or the center of a bin. For example, you can always center the bins on integers with center = 0 regardless of where the first bin actually falls.

Think of binning as tiling the real line into a infinite sequence of intervals. center and boundary set the position of one of those intervals.

origin, terminus

The locations of the left-most and right-most bins. Any values outside this range will be treated as missing. You should usually leave origin as NULL so that it is automatically computed from center and boundary.

bins

Number of bins to use if not specified. Pretty bin sizes are preferred over matching this value exactly.

pad

If TRUE, adds empty bins at either end of x. This ensures frequency polygons touch 0 outside the range of x. Defaults to FALSE.

closed

One of "right" or "left" indicating whether the bin interval is left-closed (i.e. [a, b)), or right-closed (i.e. (a, b]).

breaks

A numeric vector of break points.

Floating point

If a point is less than binwidth / 10^8 from the boundary between two bins, it is shifted to fall in the bin with the closest "closed" side.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
x <- runif(1e6)
compute_stat(bin_fixed(x), x)
compute_stat(bin_fixed(x, width = 0.25), x)
compute_stat(bin_breaks(c(0, 0.1, 0.9, 1)), x)

# Can also create fixed bins without data, if you supply the origin
# terminus, and width
bin_fixed(origin = 0, terminus = 1, width = 0.25)

bin_fixed(x, bins = 37)

# Bin other types of object
x1 <- Sys.time() + runif(1000) * 60
compute_stat(bin_date(x1), x1)
x2 <- Sys.Date() + sample(30, 10)
compute_stat(bin_date(x2), x2)

# For fixed bin width, performance scales linearly with the size of x.
x <- runif(1e7)
system.time(compute_stat(bin_fixed(x, width = 1e-1), x))
system.time(compute_stat(bin_fixed(x, width = 1e-2), x))
system.time(compute_stat(bin_fixed(x, width = 1e-5), x))

# For arbitrary breaks, performance scales linearly with x and
# logarthmically with the number of bins.
system.time(compute_stat(bin_breaks(seq(0, 1, length = 10)), x))
system.time(compute_stat(bin_breaks(seq(0, 1, length = 100)), x))
system.time(compute_stat(bin_breaks(seq(0, 1, length = 1000)), x))

hadley/ggstat documentation built on May 17, 2019, 10:40 a.m.