library(data.table) library(knitr) library(kableExtra) library(ggplot2) devtools::load_all() knitr::opts_chunk$set(echo = TRUE, warning = FALSE)
A life table is a table that includes information to describe the dying out of a birth cohort. This can also be a synthetic birth cohort, in which case we refer to it as a period life table.
Life tables are one of the most important devices in demography -- they have been used since the 1600s! They can also be useful for other fields, because they are generalizable to other discrete "time to event" data.
Typically, life tables have one row per age group, with columns representing life table metrics, also known as parameters. The life table parameters used in this package are $_nm_x$, $_na_x$, $_nq_x$, $_np_x$, $l_x$, $_nd_x$, $e_x$, $_nL_x$, and $T_x$. In this notation, $x$ refers to age, and the metrics apply to either the age $x$ directly or to the interval between ages $x$ and $x+n$ where $n$ indicates the length of the interval, typically in years. We often shorthand by removing the "n" from the notation, with interval length implied.
Here's an example of a period life table, for males in Austria in 1992,
which is saved in the package as austria_1992_lt
(source: Preston 2001):
data("austria_1992_lt") dt <- austria_1992_lt[ , .SD, .SDcols = c("age_start", "age_end", "deaths", "pop", "mx", "ax", "qx", "px", "lx", "dx", "nLx", "Tx", "ex") ] dt_rounded <- round(dt, 2) setnames(dt_rounded, "age_start", "x") setnames(dt_rounded, "age_end", "x+n") kable(dt_rounded, format = "markdown")
For reference, the following is a list of life table metrics and their definitions.
$\mathbf{_nm_x}$: mortality rate between ages $x$ and $x+n$. Shorthand to $m_x$ with implied interval width ($n$). Equals deaths divided by person-years lived in the interval. Mid-year population is commonly used as an adequate approximation of the person-years denominator.
$\mathbf{_na_x}$: mean person-years lived between ages $x$ and $x+n$ for those who die within the interval. Shorthand to $a_x$ with implied interval width ($n$).
$\mathbf{_nq_x}$: probability of death between ages $x$ and $x+n$, conditional on survival to age $x$. Shorthand to $q_x$ with implied interval width ($n$). Equals deaths in the interval divided by survivors to $x$-th birthday. Examples: $5q_0$ = probability of death between birth and age $5$; ${45}q_{15}$ = probability of death between age $15$ and age $60$ conditional on survival to age $15$.
$\mathbf{l_x}$: proportion of the cohort surviving to age $x$.
$\mathbf{e_x}$: life expectancy at age $x$ -- mean number of years lived after $x$-th birthday by those surviving to age $x$. Life expectancy at birth is $e_0$.
$\mathbf{_nL_x}$: total person-years lived between age $x$ and $x+n$.
$\mathbf{T_x}$: total person-years lived above age $x$.
$\mathbf{_nd_x}$: proportion of the cohort dying between ages $x$ and $x+n$. Shorthand to $d_x$.
$\mathbf{_np_x}$: probability of survival between ages $x$ and $x+n$ conditional on survival to age $x$. Inverse of $qx$.
We often reduce life tables to age patterns of log probability of death ($\text{log}(q_x)$) or to survival curves ($l_x$ over age), which can be easily displayed and vetted in plots.
ggplot(data = dt, aes(x = age_start, y = log(qx))) + geom_line(color = "magenta") + geom_point() + scale_x_continuous(breaks = c(0, 1, seq(5, 95, 5))) + theme_bw() + theme(axis.text.x = element_text(size = 6)) + ggtitle("log-qx over age") ggplot(data = dt, aes(x = age_start, y = lx)) + geom_line(color = "magenta") + geom_point() + scale_x_continuous(breaks = c(0, 1, seq(5, 95, 5))) + theme_bw() + theme(axis.text.x = element_text(size = 6)) + ggtitle("lx over age (survival curve)")
The demCore
package includes many utility functions for calculations
that leverage the mathematical relationships between life table metrics to
build out a complete life table. This section will provide details and examples
regarding the use of these functions and their underlying methods. We will
accomplish this by following along the example of building the example
life table above from death counts and population.
Note that this document and this package do not contain an exhaustive list of relationships between metrics. Additionally, some equations presented rely on assumptions and others are true relationships that are always valid. For more details, see the Preston Demography textbook, from which many of these details were drawn.
From raw death count and population data, the place to start with a life table is $m_x$.
$$m_x = \frac{\text{deaths}}{\text{person-years}} \approx \frac{\text{observed deaths}}{\text{mid-interval population}}$$
Let's load in our example data and calculate $m_x$:
data("austria_1992_lt") dt <- austria_1992_lt[, c("age_start", "age_end", "deaths", "pop")] dt[, mx := deaths / pop]
If we have $m_x$ and $q_x$ we can directly calculate $a_x$. However, we often use $m_x$ and $a_x$ to get $q_x$ in the first place, and so have to make some assumptions to get $a_x$. Empirical calculations of $a_x$ would require detailed and accurate data on age of death in days (such as paired date of birth and date of death), which is typically unavailable.
Rule of thumb:
One option is to assume all deaths occur in the middle of the interval, so $a_x \approx n/2$. This assumption works well for most ages, but it doesn't work as well for very young or very old where mortality can change rapidly over the interval.
Another assumption we can make is that the age-specific death rate is constant
between $x$ and $x+n$. Under this assumption,
$$_na_x = n + \frac{1}{_nm_x} - \frac{n}{1- e^{-n \cdot {_nm_x}}}.$$
The function mx_to_ax
implements this assumption.
Using our example data, we get:
dt <- hierarchyUtils::gen_length(dt, col_stem = "age") dt[, ax := mx_to_ax(mx = mx, age_length = age_length)]
Note that we can use hierarchyUtils::gen_length
to add the age_length
column
given age_start
and age_end
.
1a0 and 4a1:
Preston et al adapted an analysis first completed by Coale and Demeny (1983) to derive a relationship between infant mortality rate ($_1m_0$) and under-5 $a_x$ values ($_1a_0$ and $_4a_1$). In the absence of reliable data to produce $a_x$, these relationships can be used to predict $a_x$ from infant $m_x$:
| | Males | Females | | ------------------- |-------------| ---------| | 1a0: | | | | If 1m0 >= 0.107 | 0.330 | 0.350 | | If 1m0 < 0.107 | 0.045 + 2.684 * 1m0 | 0.053 + 2.800 * 1m0 | | | | | | 4a1: | | | | If 1m0 >= 0.107 | 1.352 | 1.361 | | If 1m0 < 0.107 | 1.651 - 2.816 * 1m0 | 1.522 - 1.518 * 1m0 |
Use the gen_u5_ax_from_mx
function to implement this method:
dt[, sex := "male"] gen_u5_ax_from_mx(dt, id_cols = c("age_start", "age_end", "sex"))
Graduation method: One strategy for selecting $a_x$ values is based on the level and slope of the $_nm_x$ function. Comparing two populations with the same $_5m_60$, the population with more rapidly rising mortality rate with respect to age will have deaths that are more concentrated in the later part of the interval (higher $a_x$). Comparing two populations with the same slope in $m_x$, the one with higher mortality rate will have more deaths at the beginning of the interval (lower $a_x$).
To utilize this theory, we can implement iteration as described in the
Preston book, and originally proposed by Keyfitz (1966):
$$na_x = \frac{\frac{-n}{24} {_nd{x-n}} + \frac{n}{2} {nd_x} +
\frac{n}{24} {_nd{x+n}}}{_nd_x}$$
Where $d_x$ is derived from the conversion from $m_x$ to $q_x$. However, since
the $m_x$ to $q_x$ conversion requires $a_x$, this requires us to pick a
starting place for $a_x$ (like $n/2$), solve for $d_x$, solve for $a_x$, and so
on until convergence. Use demCore::iterate_ax
to implement this method.
From $m_x$ and $a_x$, we can solve directly for $q_x$:
$$_nq_x = \frac{n \cdot {_nm_x}}{1 + (n - {_na_x}) \cdot {_nm_x}}$$ For the terminal age group, $q_x$ should be $1$ because all individuals surviving to the terminal age group will die in that age group (probability of death = $1$).
The mx_ax_to_qx
combines the equation for $q_x$ and the requirement that
terminal $q_x$ equal one by setting $q_x = 1$ if age_length = Inf
.
dt[, qx := mx_ax_to_qx(mx = mx, ax = ax, age_length = age_length)]
Other functions that utilize this relationship but solve for different metrics
are mx_qx_to_ax
and qx_ax_to_mx
.
You can also solve for $q_x$ under the assumption of constant mortality rate within an interval, which removes $a_x$ from the relationship:
$$_nq_x = 1 - e^{-n \cdot {_nm_x}}.$$
dt[, qx_compare := mx_to_qx(mx = mx, age_length = age_length)]
These two $q_x$ values are the same, because the implied $a_x$ in mx_to_qx
is
equivalent to the $a_x$ we generate under the assumption in mx_to_ax
.
To calculate the proportion of a cohort surviving to age $x$ ($l_x$), we set $l_0 = 1$ (100% survive to birth), and recursively calculate:
$$l_{x+n} = l_x \cdot (1 - _nq_x)$$
or in words, the proportion surviving to age $x$ times the proportion of those survivors who do not die between $x$ and $x+n$ is the proportion surviving to age $x+n$.
Our gen_lx_from_qx
function can perform this calculation:
gen_lx_from_qx(dt, id_cols = c("age_start", "age_end"))
Proportion of cohort dying between ages $x$ and $x+n$ ($d_x$) is $nq_0$ to start, then $_nd_x = l_x - l{x+n}$ thereafter (difference between proportion surviving to age $x$ and proportion surviving to age $x+n$).
To calculate $d_x$, use gen_dx_from_lx
:
gen_dx_from_lx(dt, id_cols = c("age_start", "age_end"))
The person-years lived between ages $x$ and $x+n$ ($_nL_x$) can be broken down into:
such that:
$$nL_x = n \cdot l{x+n} + _na_x \cdot {_nd_x}.$$
For the terminal age group:
$${{\infty}L_x} = \text{person-years lived above age } x = \frac{\text{person-years lived above age }x}{\text{deaths over age }x} \cdot \text{deaths over age }x= \frac{l_x}{{\infty}m_x}$$
Use the gen_nLx
function to calculate with this method:
gen_nLx(dt, id_cols = c("age_start", "age_end"))
Next, use $_nL_x$ to get $T_x$:
$$T_x = \sum_{x}^{\infty} {_nL_x}.$$
gen_Tx(dt, id_cols = c("age_start", "age_end"))
Life expectancy above age $x$ (mean person-years lived above age $x$) is equal to the total person years over age $x$ divided by the persons surviving to age $x$:
$$e_x = \frac{T_x}{lx}.$$
For the terminal age group, $a_x = e_x$ because everyone surviving to the interval dies in the interval.
Calculate $e_x$ with gen_ex
:
gen_ex(dt)
One possible set of steps for calculating a complete period life table from deaths and mid-year population is:
gen_u5_ax_from_mx
, and set ax over age 5 as n/2mx_ax_to_qx
iterate_ax
to modify ax and qx values, improving
ax over the naive n/2 valueslifetable
function. The lifetable
function combines many
of the functions described in this vignette for convenience.Preston Samuel H, Patrick H, Michel G. Demography: measuring and modeling population processes. MA: Blackwell Publishing. 2001.
Coale AJ, Demeny P, Vaughan B. Regional model life tables and stable populations: studies in population. Elsevier; 2013 Oct 22.
Keyfitz N. A life table that agrees with the data. Journal of the American Statistical Association. 1966 Jun 1;61(314):305-12.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.