ggb | R Documentation |
Given two censuses and an average annual number of deaths in each age class between censuses, we can use stable population assumptions to estimate the degree of underregistration of deaths. The method is based on finding a best-fitting linear relationship between two modeled parameters (right term and left term), but the fit, and resulting coverage estimate, depend on exactly which age range is taken. This function either finds a nice age range for you automatically, or you can specify an exact vector of ages.
ggb(
X,
minA = 15,
maxA = 75,
minAges = 8,
exact.ages = NULL,
lm.method = "tukey",
opt.method = "r2",
scale = 1,
nx.method = 2,
deaths.summed = FALSE,
mig.summed = deaths.summed
)
X |
|
minA |
the lowest age to be included in search |
maxA |
the highest age to be included in search (the lower bound thereof) |
minAges |
the minimum number of adjacent ages to be used in estimating |
exact.ages |
optional. A user-specified vector of exact ages to use for coverage estimation |
lm.method |
character, one of:
|
opt.method |
what kind of residual do we minimize? choices |
scale |
multiplicative scale factor for the minimized residual |
nx.method |
either 2 or 4. 4 is smoother. |
deaths.summed |
logical. is the deaths column given as the total per age in the intercensal period ( |
mig.summed |
logical. Is the (optional) net migration column |
Census dates can be given in a variety of ways: 1) using Date classes, and column names $date1
and $date2
(or an unambiguous character string of the date, like, "1981-05-13"
) or 2) by giving column names "day1","month1","year1","day2","month2","year2"
containing integers. If only year1
and year2
are given, then we assume January 1 dates. If year and month are given, then we assume dates on the first of the month. If you want coverage estimates for a variety of intercensal periods/regions/by sex, then stack them, and use a variable called $id
with unique values for each data chunk. Different values of $id
could indicate sexes, regions, intercensal periods, etc. The $deaths
column should refer to the average annual deaths for each age class in the intercensal period. Sometimes one uses the arithmetic average of recorded deaths in each age, or simply the average of the deaths around the time of census 1 and census 2. To identify an age-range in the traditional visual way, see ggbChooseAges()
, when working with a single year/sex/region of data. The automatic age-range determination feature of this function tries to implement an intuitive way of picking ages that follows the advice typically given for doing so visually. We minimize the square of the average squared residual between the fitted line and right term.
a data.frame
with columns for:
id group id
Mxcoverage coverage of the intercensal Mx values: sqrt(k1*k2)/b
lower lower bound of ages used for fitting
upper upper bound of ages used for fitting
a intercept
b slope
delta empirical link: exp(t*a) = k1/k2
k1 completeness of census 1
k2 completeness of census 2
k3 completeness of deaths relative to census 2 (1/b
)
t1 decimal date of census 1
t2 decimal date of census 2
t intercensal interval (t2 - t1
)
lm.method line fitting method used
nx.method birthday (Nx) approximation used (2 or 4 points)
r2 the r2 of the optimized ages used for fitting (only if ages were automatically selected)
Hill K. Estimating census and death registration completeness. Asian and Pacific Population Forum. 1987; 1:1-13.
Brass, William, 1975. Methods for Estimating Fertility and Mortality from Limited and Defective Data, Carolina Population Center, Laboratory for Population Studies, University of North Carolina, Chapel Hill.
# The Mozambique data
res <- ggb(Moz)
res
# The Brasil data
BM <- ggb(BrasilMales)
BF <- ggb(BrasilFemales)
head(BM)
head(BF)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.