Description Usage Arguments Details Value Author(s) References Examples
Creates a smooth cubic spline CDF and piecewise-quadratic PDF based on a set of binned data (edges and counts).
1 2 | splinebins(bEdges, bCounts, m = NULL,
numIterations = 16, monoMethod = c("hyman", "monoH.FC"))
|
bEdges |
A vector e_1, e_2, …, e_n giving the right endpoints of each bin. The value in e_n is ignored and assumed to be |
bCounts |
A vector c_1, c_2, …, c_n giving the counts for each bin (i.e., the number of data elements in each bin). Assumed to be nonnegative. |
m |
An estimate for the mean of the distribution. If no value is supplied, the mean will be estimated by (temporarily) setting e_n equal to 2e_{n-1}, and a warning message will be generated. |
numIterations |
The number of iterations performed by a binary search that optimizes the CDF to fit the mean. |
monoMethod |
The method for constructing a monotone spline. Must be one of |
Fits a monotone cubic spline to the points specified by the binned data to produce a smooth cumulative distribution function. The PDF is then obtained by differentiating, so it will be piecewise quadratic and preserve the area of each bin.
Returns a list with the following components.
splinePDF |
A piecewise-quadratic function giving the fitted PDF. |
splineCDF |
A piecewise-cubic function giving the CDF. |
E |
The right-hand endpoint of the support of the PDF. |
shrinkFactor |
If the supplied estimate for the mean is too small to be fitted with our method, the bins edges will be scaled by |
splineInvCDF |
An approximate inverse of |
fitWarn |
Flag set to |
David J. Hunter and McKalie Drown
Paul T. von Hippel, David J. Hunter, McKalie Drown. Better Estimates from Binned Income Data: Interpolated CDFs and Mean-Matching, Sociological Science, November 15, 2017. https://www.sociologicalscience.com/articles-v4-26-641/
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # 2005 ACS data from Cook County, Illinois
binedges <- c(10000,15000,20000,25000,30000,35000,40000,45000,
50000,60000,75000,100000,125000,150000,200000,NA)
bincounts <- c(157532,97369,102673,100888,90835,94191,87688,90481,
79816,153581,195430,240948,155139,94527,92166,103217)
sb <- stepbins(binedges, bincounts, 76091)
splb <- splinebins(binedges, bincounts, 76091)
plot(splb$splinePDF, 0, 300000, n=500)
plot(sb$stepPDF, do.points=FALSE, col="gray", add=TRUE)
# notice that the curve preserves bin area
library(pracma)
integral(splb$splinePDF, 0, splb$E)
integral(function(x){1-splb$splineCDF(x)}, 0, splb$E) # should be the mean
splb <- splinebins(binedges, bincounts, 76091, numIterations=20)
integral(function(x){1-splb$splineCDF(x)}, 0, splb$E) # closer to given mean
|
[1] 1
[1] 76090.98
[1] 76091.01
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.