knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
DemoTools
to smooth population countsSmoothing data over age is traditionally intended to have plausible/corrected estimates of population counts from census data. Smoothing procedures help to derive figures that are corrected primarily for net error by fitting different curves to the original 5 or 10-year totals, modifying the original counts [@siegel2004methods]. Several methods have been developed for this aim and the major smoothing methods are included in DemoTools
. Including the Carrier-Farrag [@carrier1959reduction], Arriaga [@arriaga1994population], Karup-King-Newton, United Stations [@united1955manual], Spencer [@spencer1987improvements] and Zelnik methods. Below we briefly give an overview of the method and apply them to the male Indian population in 1991.
library(DemoTools) # 'pop1m_ind' available as package data Value <- pop1m_ind Age <- 0:(length(Value)-1) plot(Age, Value/sum(Value), type = 'l', ylab = 'Proportion of population', xlab = 'Single age', main = 'Unsmoothed population')
This method considers the ratio, $K$, of the population in one five-year age-group to the next one [@carrier1959reduction]. If $v_0$ is a ten-year age-group, and $v_{-2}$ and $v_2$ are the preceding and succeeding age-groups, respectively, and if $K^4 = v_{-2}/v_2$. Then, the older five-year group $v_1$ can be estimated by $v_0/(1+K)$. This equation connects the population in two ten-year age groups separated by an interval of ten years. Therefore the value $K$ can be seen as the middle point between the two ten-year age groups. To run this method in DemoTools
the function afesmth
is used with the option 'Carrier-Farrag'. The figure below shows the smoothed population by five-year age groups.
cf <- smooth_age_5( Value = Value, Age = Age, method = "Carrier-Farrag", OAG = TRUE ) plot(seq(0,100,5),cf/sum(cf), type= 'l', ylab = 'Proportion of population', xlab = 'Age-group',main = 'Smoothed population with Carrier-Farrag',xaxt='n') axis(1, labels = paste0(seq(0,100,5),'-',seq(4,104,5)), at =seq(0,100,5))
Similarly to the previous method, when the 10-year age group to be separates is the central group of three, the following formulas are used in this method [@arriaga1968new]:
\begin{equation} {5}P{x+5} = \frac{-{10}P{x-10}+11 {10}P{x+10}+2 {10}P{x+10}}{24} \end{equation} and \begin{equation} {5}P{x} = {10}P{x} - {5}P{x+5} \end{equation}
Where: ${5}P{x+5}$ is the population between ages $x+5$ and $x+9$; ${10}P{x}$ is the population between ages $x$ and $x+9$; and ${5}P{x}$ is the population between ages $x+$ and $x+4$. When the 10-year age group to be separated is an extreme age group (the youngest or the oldest), the formulas are different. For the youngest age group, the following formulas are used:
\begin{equation} {5}P{x+5} = \frac{8 {10}P{x}+ 5 {10}P{x+10} - {10}P{x+20}}{24} \end{equation} and \begin{equation} {5}P{x} = {10}P{x} - {5}P{x+5} \end{equation}
and for the last age group the coefficients are reversed:
\begin{equation} {5}P{x} = \frac{ -{10}P{x-20}+ 5 {10}P{x-10}+ 8 {10}P{x}}{24}. \end{equation}
To perform this model the option 'Arriaga' must be chosen in the 'agesmth' function.
cf <- smooth_age_5( Value = Value, Age = Age, method = "Arriaga", OAG = TRUE ) plot(seq(0,100,5),cf/sum(cf), type= 'l', ylab = 'Proportion of population', xlab = 'Age-group',main = 'Smoothed population with Arriaga',xaxt='n') axis(1, labels = paste0(seq(0,100,5),'-',seq(4,104,5)), at =seq(0,100,5))
Following the same logic, the KKN method uses the following formulas:
\begin{equation} {5}P{x} = \frac{1}{2} {10}P{x} + \frac{1}{16} \big( {10}P{x-10} - {10}P{x+10} \big) \end{equation} and \begin{equation} {5}P{x+5} = {10}P{x} - {5}P{x}. \end{equation}
To implement this smoothing process select the KKN
in the agesmth
function.
# TODO smooth_age_5 is throwing an internal error # Error in tapply(Value, AgeN, sum) : arguments must have same length cf <- smooth_age_5( Value = Value, Age = Age, method = "KKN", OAG = TRUE ) plot(seq(0,100,5),cf/sum(cf), type= 'l', ylab = 'Proportion of population', xlab = 'Age-group',main = 'Smoothed population with Karup-King-Newton ', xaxt='n') axis(1, labels = paste0(seq(0,100,5),'-',seq(4,104,5)), at =seq(0,100,5))
The United Nations [@carrier1959reduction] developed the following formula to smooth population counts \begin{equation} 5\hat{P}_x = \frac{- _5P{x-10} + 4\, 5P{x-5} + 10\, 5P{x} +4\, 5P{x+5} - 5P{x+10}}{16} \end{equation}
where $_5\hat{P}_x$ represents the smoothed population between ages $x$ and $x+4$. This method can be applied in DemoTools
using the "United Nations
method of smooth_age_5()
as follows
un_result <- smooth_age_5(Value = Value,Age = Age,method="United Nations",OAG = T) plot(seq(0,100,5),un_result/sum(un_result,na.rm = T), type= 'l', ylab = 'Proportion of population', xlab = 'Age-group',main = 'Smoothed population with UN Method',xaxt='n') axis(1, labels = paste0(seq(0,100,5),'-',seq(4,104,5)), at =seq(0,100,5))
The Strong formula adjusts proportionally the smoothed 10-year age groups to the census population in those ages, after this procedure the 10-year age groups can be subdivided into 5-year age groups [@arriaga1968new]
\begin{equation} {10}\hat{P}_x = \frac{{10}P_{x-10} + 2\, 10P{x} + {10}P{x+10}}{4} \end{equation}
where $_{10}\hat{P}_x$ represents the smoothed population ages $x$ to $x+9$. It is implemented in DemoTools
as follows
strong_result <- smooth_age_5(Value = Value,Age = Age,method="Strong",OAG = T) plot(seq(0,100,5),strong_result/sum(strong_result,na.rm = T), type= 'l', ylab = 'Proportion of population', xlab = 'Age-group',main = 'Smoothed population with Strong formula',xaxt='n') axis(1, labels = paste0(seq(0,100,5),'-',seq(4,104,5)), at =seq(0,100,5))
The implementations of this methods assumes that persons incorrectly reported in peak age groups are evenly divided between the two adjacent age groups [@feeney2013]. It relies in minimizing a measure of "roughness":
consider the difference $R[i]$ between the number of persons in the $i$-th age group and the
average of the numbers in adjacent age groups, $R[i] = N[i] - (N[i -1] + N[i + 1])/2$.
If the distribution displays zigzag, the $R[i]$ will relatively large. If the distribution is smooth,
they will be relatively small. This suggests taking the sum of squares of these differences as
the measure of roughness [@feeney2013]. It can be implemented in DemoTools
as follows
zz_result <- smooth_age_5(Value, Age, method = "Zigzag",OAG = TRUE, ageMin = 0, ageMax = 100) plot(seq(0,100,5),zz_result/sum(zz_result,na.rm = T), type= 'l', ylab = 'Proportion of population', xlab = 'Age-group',main = 'Smoothed population with Zig Zag formula',xaxt='n') axis(1, labels = paste0(seq(0,100,5),'-',seq(4,104,5)), at =seq(0,100,5))
A more general way is to smooth through linear models. Polynomial fitting is used to smooth data over age or time fitting linear models. It can be tweaked by changing the degree and by either log or power transforming and can be used on any age groups, including irregularly spaced, single age, or 5-year age groups. It can be implemented in DemoTools
with the function agesmth1
and the option poly
as follows
poly_result <- agesmth1(Value, Age, method="poly", OAG = T) plot(0:100,poly_result, type= 'l', ylab = 'Proportion of population', xlab = 'Age-group',main = 'Smoothed population with Poly formula',xaxt='n') points(0:100,Value) axis(1, labels =seq(0,100,5), at =seq(0,100,5))
LOESS (locally weighted smoothing) helps to smooth data over age, preserving the open age group if necessary. It is a popular tool to create a smooth line through a timeplot or scatter plot. It can be tweaked by changing the degree and by either log or power transforming and can be used on any age groups, including irregularly spaced, single age, or 5-year age groups.
loess_result <- agesmth1(Value, Age, method="loess", OAG = T) plot(0:100,loess_result, type= 'l', ylab = 'Proportion of population', xlab = 'Age-group',main = 'Smoothed population with LOESS formula',xaxt='n') points(0:100,Value) axis(1, labels =seq(0,100,5), at =seq(0,100,5))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.