In ihmeuw-demographics/demCore: Core Functions for Demography

library(data.table)
library(knitr)
library(kableExtra)
library(ggplot2)
devtools::load_all()
knitr::opts_chunk$set(echo = TRUE, warning = FALSE)

What is a life table?

A life table is a table that includes information to describe the dying out of a birth cohort. This can also be a synthetic birth cohort, in which case we refer to it as a period life table.

Life tables are one of the most important devices in demography -- they have been used since the 1600s! They can also be useful for other fields, because they are generalizable to other discrete "time to event" data.

Typically, life tables have one row per age group, with columns representing life table metrics, also known as parameters. The life table parameters used in this package are $_nm_x$, $_na_x$, $_nq_x$, $_np_x$, $l_x$, $_nd_x$, $e_x$, $_nL_x$, and $T_x$. In this notation, $x$ refers to age, and the metrics apply to either the age $x$ directly or to the interval between ages $x$ and $x+n$ where $n$ indicates the length of the interval, typically in years. We often shorthand by removing the "n" from the notation, with interval length implied.

Here's an example of a period life table, for males in Austria in 1992, which is saved in the package as austria_1992_lt (source: Preston 2001):

data("austria_1992_lt")
dt <- austria_1992_lt[
  , .SD, .SDcols = c("age_start", "age_end", "deaths", "pop", "mx", "ax", "qx",
                     "px", "lx", "dx", "nLx", "Tx", "ex")
]
dt_rounded <- round(dt, 2)
setnames(dt_rounded, "age_start", "x")
setnames(dt_rounded, "age_end", "x+n")
kable(dt_rounded, format = "markdown")

Life table metrics

For reference, the following is a list of life table metrics and their definitions.

$\mathbf{_nm_x}$: mortality rate between ages $x$ and $x+n$. Shorthand to $m_x$ with implied interval width ($n$). Equals deaths divided by person-years lived in the interval. Mid-year population is commonly used as an adequate approximation of the person-years denominator.

$\mathbf{_na_x}$: mean person-years lived between ages $x$ and $x+n$ for those who die within the interval. Shorthand to $a_x$ with implied interval width ($n$).

$\mathbf{_nq_x}$: probability of death between ages $x$ and $x+n$, conditional on survival to age $x$. Shorthand to $q_x$ with implied interval width ($n$). Equals deaths in the interval divided by survivors to $x$-th birthday. Examples: $5q_0$ = probability of death between birth and age $5$; ${45}q_{15}$ = probability of death between age $15$ and age $60$ conditional on survival to age $15$.

$\mathbf{l_x}$: proportion of the cohort surviving to age $x$.

$\mathbf{e_x}$: life expectancy at age $x$ -- mean number of years lived after $x$-th birthday by those surviving to age $x$. Life expectancy at birth is $e_0$.

$\mathbf{_nL_x}$: total person-years lived between age $x$ and $x+n$.

$\mathbf{T_x}$: total person-years lived above age $x$.

$\mathbf{_nd_x}$: proportion of the cohort dying between ages $x$ and $x+n$. Shorthand to $d_x$.

$\mathbf{_np_x}$: probability of survival between ages $x$ and $x+n$ conditional on survival to age $x$. Inverse of $qx$.

Representing life tables graphically

We often reduce life tables to age patterns of log probability of death ($\text{log}(q_x)$) or to survival curves ($l_x$ over age), which can be easily displayed and vetted in plots.

ggplot(data = dt,
       aes(x = age_start,
           y = log(qx))) +
       geom_line(color = "magenta") +
       geom_point() +
       scale_x_continuous(breaks = c(0, 1, seq(5, 95, 5))) +
       theme_bw() +
       theme(axis.text.x = element_text(size = 6)) +
       ggtitle("log-qx over age")

ggplot(data = dt,
       aes(x = age_start,
           y = lx)) +
       geom_line(color = "magenta") +
       geom_point() +
       scale_x_continuous(breaks = c(0, 1, seq(5, 95, 5))) +
       theme_bw() +
       theme(axis.text.x = element_text(size = 6)) +
       ggtitle("lx over age (survival curve)")

Calculations and relationships between life table metrics

The demCore package includes many utility functions for calculations that leverage the mathematical relationships between life table metrics to build out a complete life table. This section will provide details and examples regarding the use of these functions and their underlying methods. We will accomplish this by following along the example of building the example life table above from death counts and population.

Note that this document and this package do not contain an exhaustive list of relationships between metrics. Additionally, some equations presented rely on assumptions and others are true relationships that are always valid. For more details, see the Preston Demography textbook, from which many of these details were drawn.

mx

From raw death count and population data, the place to start with a life table is $m_x$.

$$m_x = \frac{\text{deaths}}{\text{person-years}} \approx \frac{\text{observed deaths}}{\text{mid-interval population}}$$

Let's load in our example data and calculate $m_x$:

data("austria_1992_lt")
dt <- austria_1992_lt[, c("age_start", "age_end", "deaths", "pop")]
dt[, mx := deaths / pop]

ax

If we have $m_x$ and $q_x$ we can directly calculate $a_x$. However, we often use $m_x$ and $a_x$ to get $q_x$ in the first place, and so have to make some assumptions to get $a_x$. Empirical calculations of $a_x$ would require detailed and accurate data on age of death in days (such as paired date of birth and date of death), which is typically unavailable.

Rule of thumb:

One option is to assume all deaths occur in the middle of the interval, so $a_x \approx n/2$. This assumption works well for most ages, but it doesn't work as well for very young or very old where mortality can change rapidly over the interval.

Another assumption we can make is that the age-specific death rate is constant between $x$ and $x+n$. Under this assumption, $$_na_x = n + \frac{1}{_nm_x} - \frac{n}{1- e^{-n \cdot {_nm_x}}}.$$ The function mx_to_ax implements this assumption.

Using our example data, we get:

dt <- hierarchyUtils::gen_length(dt, col_stem = "age")
dt[, ax := mx_to_ax(mx = mx, age_length = age_length)]

Note that we can use hierarchyUtils::gen_length to add the age_length column given age_start and age_end.

1a0 and 4a1:

Preston et al adapted an analysis first completed by Coale and Demeny (1983) to derive a relationship between infant mortality rate ($_1m_0$) and under-5 $a_x$ values ($_1a_0$ and $_4a_1$). In the absence of reliable data to produce $a_x$, these relationships can be used to predict $a_x$ from infant $m_x$:

| | Males | Females | | ------------------- |-------------| ---------| | 1a0: | | | | If 1m0 >= 0.107 | 0.330 | 0.350 | | If 1m0 < 0.107 | 0.045 + 2.684 * 1m0 | 0.053 + 2.800 * 1m0 | | | | | | 4a1: | | | | If 1m0 >= 0.107 | 1.352 | 1.361 | | If 1m0 < 0.107 | 1.651 - 2.816 * 1m0 | 1.522 - 1.518 * 1m0 |

Use the gen_u5_ax_from_mx function to implement this method:

dt[, sex := "male"]
gen_u5_ax_from_mx(dt, id_cols = c("age_start", "age_end", "sex"))

Graduation method: One strategy for selecting $a_x$ values is based on the level and slope of the $_nm_x$ function. Comparing two populations with the same $_5m_60$, the population with more rapidly rising mortality rate with respect to age will have deaths that are more concentrated in the later part of the interval (higher $a_x$). Comparing two populations with the same slope in $m_x$, the one with higher mortality rate will have more deaths at the beginning of the interval (lower $a_x$).

To utilize this theory, we can implement iteration as described in the Preston book, and originally proposed by Keyfitz (1966): $$na_x = \frac{\frac{-n}{24} {_nd{x-n}} + \frac{n}{2} {nd_x} + \frac{n}{24} {_nd{x+n}}}{_nd_x}$$ Where $d_x$ is derived from the conversion from $m_x$ to $q_x$. However, since the $m_x$ to $q_x$ conversion requires $a_x$, this requires us to pick a starting place for $a_x$ (like $n/2$), solve for $d_x$, solve for $a_x$, and so on until convergence. Use demCore::iterate_ax to implement this method.

qx

From $m_x$ and $a_x$, we can solve directly for $q_x$:

$$_nq_x = \frac{n \cdot {_nm_x}}{1 + (n - {_na_x}) \cdot {_nm_x}}$$ For the terminal age group, $q_x$ should be $1$ because all individuals surviving to the terminal age group will die in that age group (probability of death = $1$).

The mx_ax_to_qx combines the equation for $q_x$ and the requirement that terminal $q_x$ equal one by setting $q_x = 1$ if age_length = Inf.

dt[, qx := mx_ax_to_qx(mx = mx, ax = ax, age_length = age_length)]

Other functions that utilize this relationship but solve for different metrics are mx_qx_to_ax and qx_ax_to_mx.

You can also solve for $q_x$ under the assumption of constant mortality rate within an interval, which removes $a_x$ from the relationship:

$$_nq_x = 1 - e^{-n \cdot {_nm_x}}.$$

dt[, qx_compare := mx_to_qx(mx = mx, age_length = age_length)]

These two $q_x$ values are the same, because the implied $a_x$ in mx_to_qx is equivalent to the $a_x$ we generate under the assumption in mx_to_ax.

lx

To calculate the proportion of a cohort surviving to age $x$ ($l_x$), we set $l_0 = 1$ (100% survive to birth), and recursively calculate:

$$l_{x+n} = l_x \cdot (1 - _nq_x)$$

or in words, the proportion surviving to age $x$ times the proportion of those survivors who do not die between $x$ and $x+n$ is the proportion surviving to age $x+n$.

Our gen_lx_from_qx function can perform this calculation:

gen_lx_from_qx(dt, id_cols = c("age_start", "age_end"))

dx

Proportion of cohort dying between ages $x$ and $x+n$ ($d_x$) is $nq_0$ to start, then $_nd_x = l_x - l{x+n}$ thereafter (difference between proportion surviving to age $x$ and proportion surviving to age $x+n$).

To calculate $d_x$, use gen_dx_from_lx:

gen_dx_from_lx(dt, id_cols = c("age_start", "age_end"))

nLx

The person-years lived between ages $x$ and $x+n$ ($_nL_x$) can be broken down into:

Person-years lived by those who survive the interval = $n \cdot l_{x+n}$
Person-years lived by those who die during the interval = $_na_x \cdot {_nd_x}$

such that:

$$nL_x = n \cdot l{x+n} + _na_x \cdot {_nd_x}.$$

For the terminal age group:

$${{\infty}L_x} = \text{person-years lived above age } x = \frac{\text{person-years lived above age }x}{\text{deaths over age }x} \cdot \text{deaths over age }x= \frac{l_x}{{\infty}m_x}$$

Use the gen_nLx function to calculate with this method:

gen_nLx(dt, id_cols = c("age_start", "age_end"))

Tx

Next, use $_nL_x$ to get $T_x$:

$$T_x = \sum_{x}^{\infty} {_nL_x}.$$

gen_Tx(dt, id_cols = c("age_start", "age_end"))

ex

Life expectancy above age $x$ (mean person-years lived above age $x$) is equal to the total person years over age $x$ divided by the persons surviving to age $x$:

$$e_x = \frac{T_x}{lx}.$$

For the terminal age group, $a_x = e_x$ because everyone surviving to the interval dies in the interval.

Calculate $e_x$ with gen_ex:

gen_ex(dt)

Summary

One possible set of steps for calculating a complete period life table from deaths and mid-year population is:

Compute mx = deaths / population
Estimate <5 ax from mx using gen_u5_ax_from_mx, and set ax over age 5 as n/2
Calculate qx directly using mx_ax_to_qx
Use ax iteration with iterate_ax to modify ax and qx values, improving ax over the naive n/2 values
Calculate px, lx, dx, nLx, Tx, and ex directly, in that order, from mx, ax, and qx, with lifetable function. The lifetable function combines many of the functions described in this vignette for convenience.
In the terminal age group, ax is equal to ex. So, once ex is solved for, terminal ax can be replaced with that value.

References

Preston Samuel H, Patrick H, Michel G. Demography: measuring and modeling population processes. MA: Blackwell Publishing. 2001.

Coale AJ, Demeny P, Vaughan B. Regional model life tables and stable populations: studies in population. Elsevier; 2013 Oct 22.

Keyfitz N. A life table that agrees with the data. Journal of the American Statistical Association. 1966 Jun 1;61(314):305-12.

ihmeuw-demographics/demCore documentation built on Feb. 24, 2024, 11:05 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ihmeuw-demographics/demCore
Core Functions for Demography

In ihmeuw-demographics/demCore: Core Functions for Demography

What is a life table?

Life table metrics

Representing life tables graphically

Calculations and relationships between life table metrics

mx

ax

qx

lx

dx

nLx

Tx

ex

Summary

References

R Package Documentation

Browse R Packages

We want your feedback!

ihmeuw-demographics/demCore Core Functions for Demography

In ihmeuw-demographics/demCore: Core Functions for Demography

What is a life table?

Life table metrics

Representing life tables graphically

Calculations and relationships between life table metrics

mx

ax

qx

lx

dx

nLx

Tx

ex

Summary

References

R Package Documentation

Browse R Packages

We want your feedback!

ihmeuw-demographics/demCore
Core Functions for Demography