merge_hist | R Documentation |
Quantize a variable by merging similar histogram bins.
merge_hist(x, b = NULL, n = b, trace = T)
x |
a numerical vector |
b |
the starting number of bins, or a vector of starting break
locations. If NULL, chosen automatically by |
n |
the desired number of bins. |
The desired number of bins is achieved by successively merging the two most similar histogram bins. The distance between bins of height (f1,f2) and width (w1,w2) is measured according to the chi-square statistic
w1*(f1-f)^2/f + w2*(f2-f)^2/f
where f is the height of the merged bin:
f = (f1*w1 + f2*w2)/(w1 + w2)
A vector of bin breaks, suitable for use in hist
,
bhist
, or cut
. Two plots are shown: a
bhist
using the returned bin breaks, and a merging trace. The
trace shows, for each merge, the chi-square distance of the bins which were
merged. This is useful for determining the appropriate number of bins. An
interesting number of bins is one that directly precedes a sudden jump in
the chi-square distance.
Tom Minka
x <- c(rnorm(100,-2,0.5),rnorm(100,2,0.5)) b <- seq(-4,4,by=0.25) merge_hist(x,b,10) # according to the merging trace, n=5 and n=11 are most interesting. x <- runif(1000) b <- seq(0,1,by=0.05) merge_hist(x,b,10) # according to the merging trace, n=6 and n=9 are most interesting. # because the data is uniform, there should only be one bin, # but chance deviations in density prevent this. # a multiple comparisons correction in merge_hist may fix this.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.