knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of demogsurv is to:
For analysis of DHS data, the package interacts well with rdhs
. See the vignette for an example.
You can install the development version from GitHub with:
# install.packages("devtools") devtools::install_github("mrc-ide/demogsurv")
The package will be released on CRAN in due course.
Load the package and example datasets created from DHS Model Datasets.
library(demogsurv) data(zzbr) # Births recode (child mortality) data(zzir) # Individuals recode (fertility, adult mortality)
By default, the function calc_nqx
calculates U5MR by periods 0-4, 5-9, and 10-14
years before the survey. Before calculating mortality rates, create a binary variable
indicator whether a death occurred and a variable giving the date of death, placed
0.5 months in the month the death occurred.
zzbr$death <- zzbr$b5 == "no" # b5: child is alive ("yes" or "no") zzbr$dod <- zzbr$b3 + zzbr$b7 + 0.5 u5mr <- calc_nqx(zzbr) u5mr
Note that calc_nqx()
does not reproduce child mortality estimates
produced in DHS reports. calc_nqx()
conducts a standard demographic rate
calculation based on observed events and person years within each age group
and then converts the cumulative hazard to survival probabilities. The standard
DHS indicator uses a rule-based approach to allocate child deaths and person
years across age groups and proceeds by calculating direct probabilities of death
in each age group (see Rutstein and Rojas 2006).
A function calc_dhs_u5mr()
will reproduce the DHS calculation, but is not
yet fully implemented.
Use the argument by=
to specify factor variables by which to stratify the rate
calculation.
calc_nqx(zzbr, by=~v102) # by urban/rural residence calc_nqx(zzbr, by=~v190, tips=c(0, 10)) # by wealth quintile, 0-9 years before calc_nqx(zzbr, by=~v101+v102, tips=c(0, 10)) # by region and residence
The sample covariance or correlation matrix of the estimates can be obtained via vcov()
.
vcov(u5mr) # sample covariance cov2cor(vcov(u5mr)) # sample correlation
Standard error estimation can be done via Taylor linearisation, unstratified jackknife, or stratified jackknife. Results are very similar.
calc_nqx(zzbr, varmethod = "lin") # default is linearization calc_nqx(zzbr, varmethod = "jkn") # stratified jackknife (varmethod = "jkn") ## Compare unstratified standard error estimates for linearization and jackknife calc_nqx(zzbr, strata=NULL, varmethod = "lin") # unstratified design calc_nqx(zzbr, strata=NULL, varmethod = "jk1") # unstratififed jackknife
To calculate different child mortality indicators (neonatal, infant, etc.), specify different age groups over which to aggregate.
calc_nqx(zzbr, agegr=c(0, 1)/12) # neonatal calc_nqx(zzbr, agegr=c(1, 3, 5, 12)/12) # postneonatal calc_nqx(zzbr, agegr=c(0, 1, 3, 5, 12)/12) # infant (1q0) calc_nqx(zzbr, agegr=c(12, 24, 36, 48, 60)/12) # child (4q1) calc_nqx(zzbr, agegr=c(0, 1, 3, 5, 12, 24, 36, 48, 60)/12) # u5mr (5q0)
Calculate annual ~5~q~0~ by calendar year (rather than years preceding survey).
calc_nqx(zzbr, period=2005:2015, tips=NULL)
The function calc_nqx()
can also used to calculate adult mortality indicators
such as ~35~q~15~. First, the convenience function reshape_sib_data()
transforms
respondent-level data to a dataset with one row for each sibling reported. Then
define a binary variable for whether the sibling is alive or dead.
zzsib <- reshape_sib_data(zzir) zzsib$death <- factor(zzsib$mm2, c("dead", "alive")) == "dead"
Calculate ~35~q~15~ for the seven year period before the survey.
calc_nqx(zzsib, agegr=seq(15, 50, 5), tips=c(0, 7), dob="mm4", dod="mm8")
Calculate ~35~q~15~ by sex, replicating Table MM2.2.
zzsib$sex <- factor(zzsib$mm1, c("female", "male")) # drop mm2 = 3: "missing" calc_nqx(zzsib, by=~sex, agegr=seq(15, 50, 5), tips=c(0, 7), dob="mm4", dod="mm8")
This calculation exactly reproduces the ~35~q~15~ estiamtes produced for Table MM2 for DHS reports. Additional functionality will be added in future for producing ASMRs (Table MM1), MMR, and PM (Table MM3) will be added in future.
The functions calc_asfr()
and calc_tfr()
calculate age-specific fertility
rates and total fertility rate, respectively. The default calculation is by
five-year age groups for three years before the survey, exactly reproducing
the estimates produced in DHS reports.
## Replicate DHS Table 5.1. ## Total ASFR and TFR in 3 years preceding survey calc_asfr(zzir, tips=c(0, 3)) calc_tfr(zzir) ## ASFR and TFR by urban/rural residence reshape2::dcast(calc_asfr(zzir, ~v025, tips=c(0, 3)), agegr ~ v025, value.var = "asfr") calc_tfr(zzir, by=~v025) calc_tfr(zzir, by=~v025, varmethod="jkn")
Replicate fertility estimates stratified by various sociodemographic characteristics.
## Replicate DHS Table 5.2 calc_tfr(zzir, ~v102) # residence calc_tfr(zzir, ~v101) # region calc_tfr(zzir, ~v106) # education calc_tfr(zzir, ~v190) # wealth quintile calc_tfr(zzir) # total
Generate estimates stratified by both calendar period and time preceding survey.
calc_tfr(zzir, period = c(2010, 2013, 2015), tips=0:5)
Calculate ASFR by birth cohort.
asfr_coh <- calc_asfr(zzir, cohort=c(1980, 1985, 1990, 1995), tips=NULL) reshape2::dcast(asfr_coh, agegr ~ cohort, value.var = "asfr")
calc_rate()
function used by mortality and fertility calculations.Extend interface to allow specification of formula, variable name, or vectors.
Add some basic data checking before rate calculation, good warnings, potentially automatic handling. [ ] Variable names all exist. [ ] No missing data in date variables. [ ] Episode start date is before episode end date. [ ] All birth history ids appear in respondent dataset.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.