knitr::opts_chunk$set( eval = nzchar(Sys.getenv("VIGNETTES")), # Only compile locally collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 5, out.width = "100%" ) # Okabe-Ito colours for discrete scales options( ggplot2.discrete.colour = c("#D55E00", "#0072B2", "#009E73", "#CC79A7", "#E69F00", "#56B4E9", "#F0E442"), ggplot2.discrete.fill = c("#D55E00", "#0072B2", "#009E73", "#CC79A7", "#E69F00", "#56B4E9", "#F0E442") )
library(vital) library(fable) library(dplyr) library(ggplot2) set.seed(2025)
The vital package can be used for stochastic population forecasting with coherent components. This is based on the papers by @HB08 and @hby. Following @HB08, we use the following demographic growth-balance equations: $$ P_t(x) = P_{t-1}(x) + B_t(x) - D_t(x) + G_t(x) $$ where
There are slightly different equations for handling the upper age group, and for baby migrants. See @HB08 for details.
To simulate future births, deaths and net migrants, we develop three functional data models for fertility, mortality and migration. The models for mortality and migration use coherent components, so that the rates for males and females do not diverge over time.
The following example uses Norwegian data up to 2022, and produces simulated populations for the ten future years. Different models are used for each component to demonstrate the flexibility of the package, but other models can be used as well.
We use a coherent functional data model [@hby] for the log mortality rates. To ensure coherence, we compute the geometric mean of the sex-specific mortality rates and the corresponding ratios, using the make_pr()
function. The data are smoothed first, and 6 components are used by default in each FDM, although only the first two are plotted here.
fit_mortality <- norway_mortality |> filter(Sex != "Total") |> smooth_mortality(Mortality) |> make_pr(.smooth) |> model(fdm = FDM(log(.smooth), coherent = TRUE)) autoplot(fit_mortality, 2)
For fertility, we use a functional mean model with a square root transformation, applied to the last 13 years of data. The plotted model shows the fitted values on the square root scale.
fit_fertility <- norway_fertility |> filter(Year > 2010) |> smooth_fertility(Fertility) |> model(fmean = FMEAN(sqrt(.smooth))) autoplot(fit_fertility)
For net migration, we use a coherent functional data model. Because net migration values can be positive or negative, we can't take products and ratios. Instead, we need to compute the means and corresponding differences using the mean_sd()
function.
netmig <- net_migration( norway_mortality |> filter(Sex != "Total"), norway_births ) |> make_sd(NetMigration) fit_migration <- netmig |> model(fdm = FDM(NetMigration, coherent = TRUE)) autoplot(fit_migration)
The generate_population()
function takes a starting population, and the three component models, and simulates future age-sex-specific population values. Here we produce ten replicates of the future population.
pop <- norway_mortality |> filter(Sex != "Total", Year == max(Year)) future <- generate_population( starting_population = pop, mortality_model = fit_mortality, fertility_model = fit_fertility, migration_model = fit_migration, h = 10, n_reps = 500 )
The first replicate is plotted below, along with the last few years of historical data.
future |> filter(.rep == "100") |> ggplot(aes(x = Age, y = Population, group = Year, color = Year)) + geom_line( data = norway_mortality |> filter(Year > 2010, Sex != "Total"), color = "grey", mapping = aes(group = Year) ) + geom_line() + scale_color_gradientn(colours = rainbow(10)[1:9]) + facet_grid(. ~ Sex)
The simulated populations can be used to compute any quantities that can be derived from populations numbers by sex and age. For example, the mean age of the population for the next 10 years
future |> group_by(Sex, .rep) |> summarise(mean_age = sum(Population * (Age + 0.5)) / sum(Population)) |> group_by(Sex) |> summarise(mean_age = mean(mean_age))
We can also plot population pyramids with prediction intervals. For example, here is the population pyramid for 2032 with a 95% prediction interval.
pyramid_2032 <- future |> filter(Year == 2032) |> mutate(Population = if_else(Sex == "Female", -Population, Population)) |> group_by(Age, Sex) |> summarise( lo = quantile(Population, 0.025), med = quantile(Population, 0.5), hi = quantile(Population, 0.975) ) pyramid_2032 |> ggplot(aes(x = Age)) + geom_ribbon(aes(ymin = lo, ymax = hi, colour = NULL), fill = "#c14b14", alpha = 0.2 ) + geom_line(aes(y = med), color = "#c14b14") + facet_grid(. ~ Sex, scales = "free_x") + labs(y = "Population") + coord_flip() + guides(fill = "none", alpha = "none")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.