Description Usage Arguments Details Value Author(s) References Examples
Correct for age heaping using truncated (log)normal distributions
1 2  correctHeaps(x, heaps = "10year", method = "lnorm", start = 0,
fixed = NULL)

x 
numeric vector 
heaps 

method 
a character specifying the algorithm used to correct the age heaps. Allowed values are

start 
a numeric value for the starting of the 5 or 10 year sequences (e.g. 0, 5 or 10) 
fixed 
numeric index vector with observation that should not be changed 
Age heaping can cause substantial bias in important measures and thus age heaping should be corrected.
For method “lnorm”, a truncated lognormal is fit to the whole age distribution. Then for each age heap (at 0, 5, 10, 15, ...) random numbers of a truncated lognormal (with lower and upper bound) is drawn in the interval + 2 around the heap (rounding of degree 2) using the inverse transformation method. A ratio of randomly chosen observations on an age heap are replaced by these random draws. For the ratio the age distribution is chosen, whereas on an age heap (e.g. 5) the arithmetic means of the two neighboring ages are calculated (average counts on age 4 and age 6 for age heap equals 5, for example). The ratio on, e.g. age equals 5 is then given by the count on age 5 divided by this mean This is done for any age heap at (0, 5, 10, 15, ...).
Method “norm” replace the draws from truncated lognormals to draws from truncated normals. It depends on the age distrubution (if rightskewed or not) if method “lnorm” or “norm” should be used. Many distributions with heaping problems are rightskewed.
Method “unif” draws the mentioned ratio of observations on truncated uniform distributions around the age heaps.
Repeated calls of this function mimics multiple imputation, i.e. repeating this procedure m times provides m imputed datasets that properly reflect the uncertainty from imputation.
a numeric vector without age heaps
Matthias Templ, Bernhard Meindl, Alexander Kowarik
M. Templ, B. Meindl, A. Kowarik, A. Alfons, O. Dupriez (2017) Simulation of Synthetic Populations for Survey Data Considering Auxiliary Information. Journal of Statistical Survey, 79 (10), 1–38. doi: 10.18637/jss.v079.i10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24  ## create some artificial data
age < rlnorm(10000, meanlog=2.466869, sdlog=1.652772)
age < round(age[age < 93])
barplot(table(age))
## artificially introduce age heaping and correct it:
# heaps every 5 years
year5 < seq(0, max(age), 5)
age5 < sample(c(age, age[age %in% year5]))
cc5 < rep("darkgrey", length(unique(age)))
cc5[year5+1] < "yellow"
barplot(table(age5), col=cc5)
barplot(table(correctHeaps(age5, heaps="5year", method="lnorm")), col=cc5)
# heaps every 10 years
year10 < seq(0, max(age), 10)
age10 < sample(c(age, age[age %in% year10]))
cc10 < rep("darkgrey", length(unique(age)))
cc10[year10+1] < "yellow"
barplot(table(age10), col=cc10)
barplot(table(correctHeaps(age10, heaps="10year", method="lnorm")), col=cc10)
# the first 5 observations should be unchanged
barplot(table(correctHeaps(age10, heaps="10year", method="lnorm", fixed=1:5)), col=cc10)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.