Vignette: Intro to healthiar"
In healthiar: Quantifying and Monetizing Health Impacts Attributable to Exposure

knitr::opts_chunk$set(echo=TRUE, eval=TRUE, include=TRUE)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(rmarkdown.html_vignette.check_title = FALSE)
options(knitr.kable.max_rows = 10)
set.seed(1)

## Avoid using pacman here, as it causes error in installation if it's not installed already
library(healthiar)
library(tibble)
library(dplyr)
library(purrr)
library(tidyr)
library(knitr)
library(terra)
library(sf)

Hi there!

This vignette will tell you about healthiar and show you how to use healthiar with the help of examples.

NOTE: Before using healthiar, please read carefully the information provided in the readme file or the welcome webpage. By using healthiar, you agree to the terms of use and disclaimer.

About `healthiar`

The healthiar functions allow you to quantify and monetize the health impacts (or burden of disease) attributable to exposure. The main focus of the EU project that initiated the development of healthiar (BEST-COST) has been two environmental exposures: air pollution and noise. However, healthiar could be used for other exposures such as green spaces, chemicals, physical activity...

See below a an overview of the healthiar, which is the first page of the cheat sheet. The whole list of functions included in healthiar is linked there and available in the reference.

Figure: Overview of healthiar

Input & output data

Input

You can enter data in healthiar functions using: - hard coded values or - columns inside pre-loaded data frames or tibbles.

Let's see some examples calling the most important function in healthiar: attribute_health().

Hard coded vs. columns

Hard coded

Depending on the function argument, you will need to enter numeric or character values.

results_pm_copd <- attribute_health(
  exp_central = 8.85, 
  rr_central = 1.369, 
  rr_increment = 10,  
  erf_shape = "log_linear",
  cutoff_central = 5,
  bhd_central = 30747 
)

Columns

healthiar comes with some example data that start with exdat_ that allow you to test functions. Some of these example data will be used in some examples in this vignette.

Now let's attribute_health() with input data from the healthiar example data. Note that you can easily provide input data to the function argument using the $ operator.

results_pm_copd <- attribute_health(
  erf_shape = "log_linear",
  rr_central = exdat_pm$relative_risk, 
  rr_increment = 10, 
  exp_central = exdat_pm$mean_concentration,
  cutoff_central = exdat_pm$cut_off_value,
  bhd_central = exdat_pm$incidence
)

Tidy data

Be aware that healthiar functions are easier to use if your data is prepared in a tidy format, i.e.:

Each variable is a column; each column is a variable.
Each observation is a row; each row is an observation.
Each value is a cell; each cell is a single value.

To know more about the concept of tidy format, see the article by [@Wickham2014_jss].

For example, in attribute health() the length of the input vectors to be entered in the arguments must be either 1 or the result of the combinations of the different values of:

geo_id_micro
exp_...
sex
age
(info for further sub-group analysis)

Output

Structure

The output of the healthiarfunction attribute_health() and attribute_lifetable consists of two lists ("folders"):

health_main contains the main results
health_detailed contained detailed results and additional info about the assessment.

In other healthiar functions you can find a similar output structure but using different prefixes. E.g., social_in socialize() and monetization_in monetitize().

Access

A similar structure can be found in other large functions in helathiar, e.g., attribute_lifetable(), compare(), socialize() or monetize(). In some functions, different elements are available in the output. For instance, attribute_lifetable() creates additional output that is specific to life table calculations.

There exist different, equivalent ways of accessing the output:

With $ operator: results_pm_copd$health_main$impact_rounded (as in the example above)
By mouse: go to the Environment tab in RStudio and click on the variable you want to inspect, and then open the health_main results table
With [[]] operator results_pm_copd[["health_main"]]
With pluck() & pull(): use the purrr::pluck function to select a list and then the dplyr::pull function extract values from a specified column, e.g. results_pm_copd |> purrr::pluck("health_main") |> dplyr::pull("impact_rounded")

Function examples

The descriptions of the healthiar functions provide examples that you can execute (with healthiar loaded) by running example("function_name"), e.g. example("attribute_health"). In the sections below in this vignette, you find additional examples and more detailed explanations.

Relative risk

Goal

E.g., to quantify the COPD cases attributable to PM2.5 (air pollution) exposure in a country.

Methodology

The comparative risk assessment approach [@Murray2003_e] is applied obtaining the population attributable fraction (percent of cases that are attributable to the exposure) based on the relative risk. The exposure scenario is compared with a counter-factual scenario.

This approach has been extensive documented and applied [e.g., @WHO2003_report; @Steenland2006-e; @Soares2020_report; @Pozzer2023_gh; @GBD2020_tl; @Lehtomaki_2025_eh].

Figure: Relative risk approach {width="700"}

Population attributable fraction

General integral form for the population attributable fraction (PAF):

$$PAF = \frac{\int rr_at_exp(x) \times PE(x)dx - 1}{\int rr_at_exp(x) \times pop_exp(x)dx}$$

Where:

$x$ = exposure level
$PE(x)$ = population distribution of exposure
$rr_at\exp(x)$ = relative risk at exposure level compared to reference

Simplified for categorical exposure distribution

If exposure is categorical, the integrals are converted to sums:

$$PAF = \frac{\sum rr_at_exp_i \times PE_i - 1}{\sum rr_at_exp_i \times PE_i}$$

Alternatively, an equivalent form is:

$$PAF = \frac{\sum PE_i \times (rr_at_exp_i - 1)}{\sum PE_i\times (rr_at_exp_i - 1) + 1}$$

Simplified for single exposure value

If there is one single single exposure value, corresponding to the population weighted mean concentration, the equation can be simplified as follows:

$$PAF = \frac{rr_at_exp - 1}{rr_at_exp }$$

Scaling relative risk

How to get this relative risk at exposure level (rr_at_exp)? This is normally different to the relative risk published in the epidemiological literature (rr) together with the (concentration/dose) increment that corresponds to this relative risk. The equations used for scaling relative risk depend on the chosen exposure-response function shapes:

linear [Lehtomaki_2025_eh] $$RRexp = 1 + \frac{rr - 1}{increment} \times (exp - cutoff)$$
log-linear [@Lehtomaki_2025_eh] $$RRexp = e^{\frac{\log(rr)}{increment} \times (exp - cutoff)}$$
log-log [@Lehtomaki_2025_eh] $$RRexp = \left( \frac{exp + 1}{cutoff + 1} \right)^{\frac{\log(rr)}{\log(increment + cutoff + 1) - \log(cutoff + 1)}}$$
linear-log [@Pozzer2023_gh] $$RRexp = 1 + \frac{\log(rr - 1)}{\log(increment + cutoff + 1) - \log(cutoff + 1)} \times \frac{\log(exp + 1)}{\log(cutoff + 1)}$$

The relative risk at exposure level (rr_at_exp) and is part of the output of attribute_health() and attribute_lifetable(). rr_at_exp can also be calculated using get_risk().

For conversion of hazard ratios and/or odds ratios to relative risks refer to [@VanderWeele2019_biom] and/or use the conversion tools developed by the Teaching group in EBM in 2022 for hazard ratios (https://ebm-helper.cn/en/Conv/HR_RR.html) and/or odds ratios (https://ebm-helper.cn/en/Conv/OR_RR.html).

Function call

results_pm_copd <- attribute_health(
  approach_risk = "relative_risk", # If you do not call this argument, "relative_risk" will be assigned by default.
  erf_shape = "log_linear",
  rr_central = exdat_pm$relative_risk, 
  rr_increment = 10, 
  exp_central = exdat_pm$mean_concentration,
  cutoff_central = exdat_pm$cut_off_value,
  bhd_central = exdat_pm$incidence
)

Main results

results_pm_copd$health_main

It is a table of the format tibble of 3 rows and 23 columns. Be aware that this main output contains input data, some intermediate steps and the final results in different formats.

Let's zoom in on some relevant aspects.

results_pm_copd$health_main |> 
  dplyr::select(exp, bhd, rr, erf_ci, pop_fraction, impact_rounded) |> 
  knitr::kable() # For formatting reasons only: prints tibble in nice layout

Interpretation: this table shows us that exposure was 8.85 $\mu g/m^3$, the baseline health data (bhd_central) was 30747 (COPD incidence in this instance). The 1st row further shows that the impact attributable to this exposure using the central relative risk (rr_central) estimate of 1.369 is 3502 COPD cases, or \~11% of all baseline cases.

Some of the most results columns include:

impact_rounded rounded attributable health impact/burden
impact raw impact/burden
pop_fraction population attributable fraction (PAF) or population impact fraction (PIF)

Absolute risk

Goal

E.g., to quantify the number incidence cases of high annoyance attributable to (road traffic) noise exposure.

Methodology

In the absolute risk calculation pathway, estimates are based on the size and distribution of the exposed population, rather than on baseline health data, as is the case in the relative risk pathway [@WHO2011_report].

Figure: Absolute risk approach {width="700"}

$$N = \sum AR_i \times PE_i$$

Where:

$N$ = attributed cases
$AR_i$ = absolute risk at category $i$
$PE_i$ = absolute population exposed at category $i$

Function call

results_noise_ha <- attribute_health(
  approach_risk = "absolute_risk", # default is "relative_risk"
  exp_central = c(57.5, 62.5, 67.5, 72.5, 77.5), # mean of the exposure categories
  pop_exp = c(387500, 286000, 191800, 72200, 7700), # population exposed per exposure category
  erf_eq_central = "78.9270-3.1162*c+0.0342*c^2" # exposure-response function
)

The erf_eq_central argument can digest other types of functions (see section on user-defined ERF).

Main results

results_noise_ha |> 
  purrr::pluck("health_main") |>
  dplyr::select(erf_eq, erf_ci, impact_rounded) |> 
  knitr::kable() # Prints tibble in a minimal layout

Results per noise exposure band

results_noise_ha$health_detailed$results_raw

results_noise_ha[["health_detailed"]][["results_raw"]] |> 
  select(exp_category, exp, pop_exp, impact) |> 
  knitr::kable()

Remember, that if the equation of the exposure-response function (erf_eq_...) requires taking a maximum in a vectorised context, pmax() must be used instead of max(). pmax() should be used whenever an element-wise maximum is required (the output will be a vector), while max() returns a single global maximum for the entire vector. For example:

erf_eq_central <- 
  "exp(0.2969*log((pmax(0,c-2.4)/1.9+1))/(1+exp(-(pmax(0,c-2.4)-12)/40.2)))"

One exposure category

Alternatively, it's also possible to only assess the absolute risk impacts for one exposure category (e.g., a single noise exposure band).

results_noise_ha <- attribute_health(
  approach_risk = "absolute_risk",
  exp_central = 57.5,
  pop_exp = 387500,
  erf_eq_central = "78.9270-3.1162*c+0.0342*c^2"
)

results_noise_ha$health_main

results_noise_ha[["health_main"]] |> 
  dplyr::select(exp_category, impact) |> 
  knitr::kable()

rm(results_noise_ha)

Multiple geographic units

using relative risk

Goal

E.g., to quantify the disease cases attributable to PM2.5 exposure in multiple cities using one single command.

Function call

Enter unique ID's as a vector (numeric or character) to the geo_id_micro argument (e.g., municipality names or province abbreviations)
Optional: aggregate unit-specific results by providing higher-level ID's (e.g., region names or country abbreviations) as a vector (numeric or character) to the geo_id_macro argument

Input to the other function arguments is specified as usual, either as a vector or a single values (which will be recycled to match the length of the other input vectors).

results_iteration <- attribute_health(
    # Names of Swiss cantons
    geo_id_micro = c("Zurich", "Basel", "Geneva", "Ticino", "Jura"),
    # Names of languages spoken in the selected Swiss cantons
    geo_id_macro = c("German","German","French","Italian","French"),
    rr_central = 1.369,
    rr_increment = 10, 
    cutoff_central = 5,
    erf_shape = "log_linear",
    exp_central = c(11, 11, 10, 8, 7),
    bhd_central = c(4000, 2500, 3000, 1500, 500)
)

In this example we want to aggregate the lower-level geographic units (municipalities) by the higher-level language region ("German", "French", "Italian").

Main results

The main output contains aggregated results

results_iteration$health_main

results_iteration[["health_main"]] |> 
  dplyr::select(geo_id_macro, impact_rounded, erf_ci, exp_ci, bhd_ci) |> 
  knitr::kable()

In this case health_main contains the cumulative / summed number of stroke cases attributable to PM2.5 exposure in the 5 geo units, which is r sum(results_iteration[["health_main"]]$impact_rounded) (using a relative risk of 1.369).

Detailed results

The geo unit-specific information and results are stored under health_detailed>results_raw .

results_iteration$health_detailed$results_raw

results_iteration[["health_detailed"]][["results_raw"]] |> 
  dplyr::select(geo_id_micro, impact_rounded, geo_id_macro) |> 
  knitr::kable()

health_detailed also contains impacts obtained through all combinations of input data central, lower and upper estimates (as usual), besides the results per geo unit (not shown above).

rm(results_iteration)

using absolute risk

Goal

E.g., to quantify high annoyance cases attributable to noise exposure in rural and urban areas.

Function call

data <- exdat_noise |> 
  ## Filter for urban and rural regions
  dplyr::filter(region == "urban" | region == "rural")

results_iteration_ar <- attribute_health( 
    # Both the rural and urban areas belong to the higher-level "total" region
    geo_id_macro = "total",
    geo_id_micro = data$region,
    approach_risk = "absolute_risk",
    exp_central = data$exposure_mean,
    pop_exp = data$exposed,
    erf_eq_central = "78.9270-3.1162*c+0.0342*c^2"
)

NOTE: the length of the input vectors fed to geo_id_micro, exp_central, pop_exp must match and must be

(number of geo units) x (number of exposure categories) = 2 x 5 = 10,

because we have 2 geo units ("rural" and "urban") and 5 exposure categories.

Main results

health_main contains the aggregated results (i.e. sum of impacts in rural and urban areas).

results_iteration_ar$health_main

results_iteration_ar[["health_main"]] |> 
  dplyr::select(geo_id_macro, impact_rounded, erf_ci, exp_ci) |> 
  knitr::kable()

Detailed results

Impact by geo unit, in this case impact in the rural and in the urban area.

results_iteration_ar$health_detailed$results_by_geo_id_micro

results_iteration_ar[["health_detailed"]][["results_by_geo_id_micro"]] |> 
  dplyr::select(geo_id_micro, geo_id_macro, impact) |> 
  knitr::kable()

rm(results_iteration_ar, data)

Uncertainty

Confidence interval

Goal

E.g., to quantify the COPD cases attributable to PM2.5 exposure taking into account uncertainty (lower and upper bound of confidence interval) in several input arguments: relative risk, exposure and baseline health data.

Function call

results_pm_copd <- attribute_health(
    erf_shape = "log_linear",
    rr_central = 1.369, 
    rr_lower = 1.124, # lower 95% confidence interval (CI) bound of RR
    rr_upper = 1.664, # upper 95% CI bound of RR
    rr_increment = 10, 
    exp_central = 8.85, 
    exp_lower = 8, # lower 95% CI bound of exposure
    exp_upper = 10, # upper 95% CI bound of exposure
    cutoff_central = 5,
    bhd_central = 30747, 
    bhd_lower = 28000, # lower 95% confidence interval estimate of BHD
    bhd_upper = 32000 # upper 95% confidence interval estimate of BHD
)

Detailed results

Let's inspect the detailed results:

results_pm_copd$health_detailed$results_raw

results_pm_copd$health_detailed$results_raw |> 
  dplyr::select(erf_ci, exp_ci, bhd_ci, impact_rounded) |> 
  dplyr::slice(1:9) |> 
  knitr::kable() # Prints tibble in a minimal layout

Each row contains the estimated attributable cases (impact_rounded) obtained by the input data specified in the columns ending in "_ci" and the other calculation pathway specifications in that row (not shown).

The 1st contains the estimated attributable impact when using the central estimates of relative risk, exposure and baseline health data.
The 2nd row shows the impact when using the central estimates of the relative risk, exposure in combination with the lower estimate of the baseline health data.
...

NOTE: only r length(1:9) of the r length(results_pm_copd$health_detailed$results_raw$impact) possible combinations are displayed due to space constraints.

NOTE: only a selection of columns are shown.

Monte Carlo simulation

Goal

E.g., to summarize uncertainty of attributable health impacts (i.e. to get a single confidence interval instead of many combinations) by using a Monte Carlo simulation.

Methodology

General concepts

A Monte Carlo simulation is a statistical method that generates repeated random sampling [@Robert2004_book; @Rubinstein2016_book. In healthiar, you can use the function summarize_uncertainty() to simulate values in the arguments with uncertainty and estimate a single confidence interval in the results.

For each entered input argument that includes a (95%) confidence interval (i.e. _lower and _upper bound value) a distribution is fitted (see distributions below). The median of the simulated attributable impacts is reported as the central estimate. The 2.5th and 97.5th percentiles of these simulated impacts define the lower and upper bounds of the 95% summary uncertainty interval. Aggregated central, lower and upper estimates are obtained by summing the corresponding values of each lower level unit.

Distributions used for simulation

summarize_uncertainty() assumes the following shapes of the distributions in the simulations:

Relative risk: The values are simulated based on an optimized gamma distribution, which fits well as relative risks are positive and their distributions are usually right-skewed. The gamma distribution is parametrized such that its mean is equal to the central relative risk estimate (rate= shape/rr_central). The shape parameter is then optimized using stats::optimize() to match the inputed 95% confidence interval bounds, with stats::qgamma() used to evaluate candidate distributions. Finally, n_sim relative risk values are simulated using stats::rgamma().
Exposure, cutoff, baseline health data and duration: The values are simulated based on a normal distribution using stats::rnorm() with mean = exp_central, mean = cutoff_central, mean = bhd_central or mean = duration_central and a standard deviation based on corresponding lower and upper 95% exposure confidence interval values. The standard deviation is calculated as $$(upper-lower)/(2*1.96)$$, since for a normal distribution the 95% CI spans approximately two standard deviations on either side of the mean.
Disability weights: The values are simulated based on a beta distribution, as both the disability weights and the beta distribution are bounded by 0 and 1. The beta distribution best fitting the inputted central disability weight estimate and corresponding lower and upper 95% confidence interval values is fitted using stats::qbeta() (the best fitting distribution parameters shape1 and shape2 are determined using stats::optimize()). For this purpose, we partly adapted the R function prevalence::beta_expert with permission of one of the authors [@Devleesschauwer2022_package]. Finally, n_sim disability weight values are simulated using stats::rbeta().

For stability of the 95% confidence interval, a large number of simulations (e.g., 10,000) is recommended in practice. The example below uses n_sim = 100 for brevity.

Function call

results_pm_copd_summarized <- 
  summarize_uncertainty(
    output_attribute = results_pm_copd,
    n_sim = 100
)

Main results

The outcome of the Monte Carlo analysis is added to the variable entered as the results argument, which is results_pm_copd in our case.

Two lists ("folders") are added:

uncertainty_main contains the central estimate and the corresponding 95% confidence intervals obtained through the Monte Carlo assessment and
uncertainty_detailed contains all n_sim simulations of the Monte Carlo assessment.

results_pm_copd_summarized$uncertainty_main

results_pm_copd_summarized$uncertainty_main |> 
  knitr::kable()

Detailed results

The folder uncertainty_detailed contains all single simulations. Let's look at the impact of the first 10 simulations.

The columns erf_ci, exp_ci, bhd_ci, and cutoff_ci indicate the source of uncertainty component used for that simulation (in the first 10 simulations, all use central estimates).

results_pm_copd_summarized$uncertainty_detailed$impact_by_sim

results_pm_copd_summarized$uncertainty_detailed$impact_by_sim |> 
  knitr::kable()

rm(results_pm_copd_summarized)

User-defined ERF

Goal

E.g., to quantify COPD cases attributable to air pollution exposure by applying a user-defined exposure-response function (ERF), such as the MR-BRT curves from Global Burden of Disease study.

Function call

In this case, the function arguments erf_eq_... require a function as input, so we use an auxiliary function (splinefun()) to transform the points on the ERF into type function.

results_pm_copd_mr_brt <- attribute_health(
  exp_central = 8.85,
  bhd_central = 30747,
  cutoff_central = 0,
  # Specify the function based on x-y point pairs that lie on the ERF
  erf_eq_central = splinefun(
    x = c(0, 5, 10, 15, 20, 25, 30, 50, 70, 90, 110),
    y = c(1.00, 1.04, 1.08, 1.12, 1.16, 1.20, 1.23, 1.35, 1.45, 1.53, 1.60),
    method = "natural")
)

The ERF curve created looks as follows

x <- c(0, 5, 10, 15, 20, 25, 30, 50, 70, 90, 110)
y <- c(1.00, 1.04, 1.08, 1.12, 1.16, 1.20, 1.23, 1.35, 1.45, 1.53, 1.60)
spline_func <- 
  stats::splinefun(
    x = x,
    y = y,
    method = "natural")
## Generate finer x-values
x_dense <- seq(min(x), max(x), length.out = 200)
## Evaluate the spline function at these points
y_dense <- spline_func(x_dense)
## Plot
plot(x, 
     y, 
     # main = "User-defined ERF", 
     main = "", 
     xlab = "Exposure",
     ylab = "Relative risk",
     col = "blue", 
     pch = 19)
lines(x_dense, y_dense, col = "red", lwd = 2)
legend("topleft", legend = c("Original points", "Spline curve"),
       col = c("blue", "red"), pch = c(19, NA), lty = c(NA, 1), lwd = c(NA, 2))

Alternatively, other functions (e.g. approxfun()) can be used to create the ERF

Sub-group analysis

by age group

Goal

E.g., to quantify health impacts attributable to air pollution in a country by age group.

Function call

To obtain age-group-specific results, the baseline health data (and possibly exposure) must be available by age group.

If the age argument was specified, age-group-specific results are available under health_detailed in the sub-folder results_by_age_group.

results_age_group <- attribute_health(
        approach_risk = "relative_risk",
        age = c("below_65", "65_plus"),
        exp_central = c(8, 7),
        cutoff_central = c(5, 5),
        bhd_central = c(1000, 5000),
        rr_central = 1.06,
        rr_increment = 10,
        erf_shape = "log_linear"
      )

Results by age group

results_age_group$health_detailed$results_by_age_group |> 
  dplyr::select(age_group, impact_rounded, exp, bhd) |> 
  knitr::kable()

rm(results_age_group)

by sex

Goal

E.g., to quantify health impacts attributable to air pollution in a country by sex.

Function call

The baseline health data (and possibly exposure) must be entered by sex.

results_sex <- attribute_health(
        approach_risk = "relative_risk",
        sex = c("female", "male"),
        exp_central = c(8, 8),
        cutoff_central = c(5, 5),
        bhd_central = c(1000, 1100),
        rr_central = 1.06,
        rr_increment = 10,
        erf_shape = "log_linear"
      )

Results by sex

If the sex argument was specified, sex-specific results are available under health_detailed in the sub-folder results_by_sex.

results_sex$health_detailed$results_by_sex |> 
  dplyr::select(sex, impact_rounded, exp, bhd) |> 
  knitr::kable()

rm(results_sex)

by other sub-groups

Goal

E.g., to quantify attributable health impacts stratified by a sub-group different to age and sex, e.g., education level.

Function call

A single vector (or a data frame / tibble with multiple columns) to group the results by can be entered to the info argument. In this example, this will be information about the education level.

In a second step one can group the results based on one or more columns and so summarize the results by the preferred sub-groups.

output_attribute <- healthiar::attribute_health(
    rr_central = 1.063,
    rr_increment = 10,
    erf_shape = "log_linear",
    cutoff_central =  0,
    exp_central = c(6, 7, 8,
                    7, 8, 9,
                    8, 9, 10,
                    9, 10, 11),
    bhd_central = c(600, 700, 800,
                    700, 800, 900,
                    800, 900, 1000,
                    900, 1000, 1100),
    geo_id_micro = rep(c("a", "b", "c", "d"), each = 3),
    info = data.frame(
      education = rep(c("secondary", "bachelor", "master"), times = 4)) # education level
  )

Results by other sub-group

output_stratified <- output_attribute$health_detailed$results_raw |>
      dplyr::group_by(info_column_1) |>
      dplyr::summarize(mean_impact = mean(impact))|>
      dplyr::pull(mean_impact) |>
      print()

rm(output_attribute, output_stratified)

by age, sex and other sub-groups

Goal

E.g., to quantify attributable health impacts stratified by age, sex and additional sub-group e.g. education level.

Function call

output_attribute <- healthiar::attribute_health(
    rr_central = 1.063,
    rr_increment = 10,
    erf_shape = "log_linear",
    cutoff_central =  0,
    age_group = base::rep(c("50_and_younger", "50_plus"), each = 4, times= 2),
    sex = base::rep(c("female", "male"), each = 2, times = 4),
    exp_central = c(6, 7, 8, 7, 8, 9, 8, 9,
                    10, 9, 10, 11, 10, 11, 12, 13),
    bhd_central = c(600, 700, 800, 700, 800, 900, 800, 900,
                    1000, 900, 1000, 1100, 1000, 1100, 1200, 1000),
    geo_id_micro = base::rep(c("a", "b"), each = 8),
    info = base::data.frame(
      education = base::rep(c("without_master", "with_master"), times = 8)) # education level
  )

Results by all sub-groups

output_stratified <- output_attribute$health_detailed$results_raw |>
      dplyr::group_by(info_column_1) |>
      dplyr::summarize(mean_impact = mean(impact))|>
      dplyr::pull(mean_impact) |>
      print()

rm(output_attribute, output_stratified)

YLL & deaths with life table

YLL

Goal

E.g., to quantify the years of life lost (YLL) due to deaths from COPD attributable to PM2.5 exposure during one year.

Methodology

Data preparation

The life table approach to obtain YLL and deaths requires population and baseline mortality data to be stratified by one year age groups. However, in some cases these data are only available for larger age groups (e.g., 5-year data: 0-4 years old, 5-9 years old, ...). What to do?

If your population and mortality data are not available by one-year age group, your data must be prepared by interpolating values. The healthiar function prepare_lifetable() makes this conversion using the same approach as the WHO tool AirQ+ [@WHO2020_report].
If your population and death data are stratified by one-year age group, you are lucky, you can ignore this initial step.

General concept

The life table methodology of attribute_lifetable() follows that of the WHO tool AirQ+ [@WHO2020_report], which is described in more detail by @Miller2003_jech.

In short, two scenarios are compared:

a) a scenario with the exposure level specified in the function ("exposed scenario") and

b) a scenario with no exposure ("unexposed scenario").

First, the entry and mid-year populations of the (first) year of analysis in the unexposed scenario is determined using modified survival probabilities. Second, age-specific population projections using scenario-specific survival probabilities are done for both scenarios. Third, by subtracting the populations in the unexposed scenario from the populations in the exposed scenario the premature deaths/years of life lost attributable to the exposure are determined.

An expansive life table case study for is available in a report of @Miller2010_report.

Determination of populations in the (first) year of analysis

Entry population

The entry (i.e. start of year) populations in both scenarios (exposed and unexposed) is determined as follows:

$$entry_population_{year_1} = midyear_population_{year_1} + \frac{deaths_{year_1}}{2}$$

Survival probabilities

Exposed scenario

The survival probabilities in the exposed scenario from start of year $i$ to start of year $i+1$ are calculated as follows:

$$prob_survival = \frac{midyear_population_i - \frac{deaths_i}{2}}{midyear_population_i + \frac{deaths_i}{2}}$$

Analogously, the probability of survival from start of year $i$ to mid-year $i$:

$$prob_survival_until_midyear = 1 - \frac{1 - prob_survival}{2}$$

Unexposed scenario

The survival probabilities in the unexposed scenario are calculated as follows:

First, the age-group specific hazard rate in the exposed scenario is calculated using the inputted age-specific mid-year populations and deaths.

$$hazard_rate = \frac{deaths}{mid_year_population}$$

Second, the hazard rate is multiplied with the modification factor ($= 1 - PAF$) to obtain the age-specific hazard rate in the unexposed scenario.

$$hazard_rate_mod = hazard_rate \times modification_factor$$

Third, the the age-specific survival probabilities (from the start until the end in a given age group) in the unexposed scenario are calculated as follows (cf. Miller & Hurley 2003):

$$prob_survival_mod = \frac{2-hazard_rate_mod}{2+hazard_rate_mod}$$

Mid-year population

The mid-year populatios of the (first) year of analysis (year_1) in the unexposed scenario are determined as follows:

First, the survival probabilities from start of year $i$ to mid-year $i$ in the unexposed scenario is calculated as:

$$prob_survival_until_midyear_{mod} = 1 - \frac{1 - prob_survival_mod}{2}$$

Second, the mid-year populations of the (first) year of analysis (year_1) in the unexposed scenario is calculated:

$$midyear_population_unexposed_{year_1} = entry_population_{year_1} \times prob_survival_until_midyear_{mod}$$

Population projection

Using the age group-specific and scenario-specific survival probabilities calculated above, future populations of each age-group under each scenario are calculated.

Exposed scenario

The population projections for the two possible options of approach_exposure ("single_year" and "constant") for the unexposed scenario are different. In the case of "single_year" exposure, the population projection for the years after the year of exposure is the same as in the unexposed scenario.

In the case of "constant" the population projection is done as follows:

First, the entry population of year $i+1$ is calculated (which is the same as the end of year population of year $i$) using the entry population of year $i$.

$$entry_population_{i+1} = entry_population_i \times prob_survival$$

Second, the mid-year population of year $i+1$ is calculated.

$$midyear_population_{i+1} = entry_population_{i+1} \times prob_survival_until_midyear$$

Unexposed scenario

The entry and mid-year population projections of in the exposed scenario is done as follows:

First, the entry population of year $i+1$ is calculated (which is the same as the end of year population of year $i$) by multiplying the entry population of year $i$ and the modified survival probabilities.

$$entry_population_{i+1} = entry_population_i \times prob_survival_mod$$

Second, the mid-year population of year $i+1$ is calculated.

$$midyear_population_{i+1} = entry_population_{i+1} \times prob_survival_until_midyear$$

Function call

We can use attribute_lifetable() combined with life table input data to determine YLL attributable to an environmental stressor.

results_pm_yll <- attribute_lifetable(
  year_of_analysis = 2019, 
  health_outcome = "yll",
  rr_central =  1.118, 
  rr_increment = 10,
  erf_shape = "log_linear",
  exp_central = 8.85,
  cutoff_central = 5,
  min_age = 20, # age from which population is affected by the exposure
  # Life table information
  age_group = exdat_lifetable$age_group,
  sex = exdat_lifetable$sex,
  population = exdat_lifetable$midyear_population,
  # In the life table case, BHD refers to deaths
  bhd_central = exdat_lifetable$deaths
)

Main results

Total YLL attributable to exposure (sum of sex-specific impacts).

results_pm_yll$health_main

results_pm_yll$health_main |> 
  dplyr::slice_head() |> 
  dplyr::select(impact_rounded, erf_ci, exp_ci, bhd_ci) |> 
  knitr::kable()

Use the two arguments approach_exposure and approach_newborns to modify the life table calculation:

approach_exposure
- "single_year" (default) population is exposed for a single year and the impacts of that exposure are calculated
- "constant" population is exposed every year
approach_newborns
- "without_newborns" (default) assumes that the population in the year of analysis is followed over time, without considering newborns being born
- "with_newborns" assumes that for each year after the year of analysis n babies are born, with n being equal to the (male and female) population aged 0 that is provided in the argument population

Detailed results

Attributable YLL results

per year
per age (group)
per sex (if sex-specific life table data entered)

are available.

Note: We will inspect the results for females; male results are also available.

Results per year

NOTE: only a selection of years is shown.

results_pm_yll$health_detailed$results_raw |>
  dplyr::summarize(
    .by = year, 
    impact = sum(impact, na.rm = TRUE)
  )

results_pm_yll$health_detailed$results_raw |>
  dplyr::summarize(
    .by = year, 
    impact = sum(impact, na.rm = TRUE)) |>
  knitr::kable()

YLL

results_pm_yll$health_detailed$intermediate_calculations |> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(impact_by_age_and_year) |>
  purrr::pluck(1)

results_pm_yll$health_detailed$intermediate_calculations |> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(impact_by_age_and_year) |>
  purrr::pluck(1) |>
  dplyr::select(age_start, age_end, impact_2019) |> 
  dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> 
  knitr::kable()

Population (baseline scenario)

Baseline scenario refers to the scenario with exposure (i.e. the scenario specified in the assessment).

results_pm_yll$health_detailed$intermediate_calculations|> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(projection_if_exposed_by_age_and_year) |>
  purrr::pluck(1)

results_pm_yll$health_detailed$intermediate_calculations|> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(projection_if_exposed_by_age_and_year) |>
  purrr::pluck(1) |>   
  dplyr::select(age_start, midyear_population_2019, midyear_population_2020, 
                midyear_population_2021, midyear_population_2022) |>    
  dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |>  
  knitr::kable()

Population (unexposed scenario)

Impacted scenario refers to the scenario without exposure.

results_pm_yll$health_detailed$intermediate_calculations|> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(projection_if_unexposed_by_age_and_year) |>
  purrr::pluck(1)

results_pm_yll$health_detailed$intermediate_calculations|> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(projection_if_unexposed_by_age_and_year) |>
  purrr::pluck(1) |>   
  dplyr::select(age_start, midyear_population_2019, midyear_population_2020, 
                midyear_population_2021, midyear_population_2022) |>    
  dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |>  
  knitr::kable()

Deaths

Goal (e.g.)

E.g., to determine premature deaths from COPD attributable to PM2.5 exposure during one year.

Function call

See example on YLL for additional info on attribute_lifetable() calculations and its output.

results_pm_deaths <- attribute_lifetable(
  health_outcome = "deaths",
  year_of_analysis = 2019,
  rr_central =  1.118, 
  rr_increment = 10,
  erf_shape = "log_linear",
  exp_central = 8.85,
  cutoff_central = 5,
  min_age = 20, # age from which population is affected by the exposure   
  # Life table information
  age_group = exdat_lifetable$age_group,   
  sex = exdat_lifetable$sex,
  population = exdat_lifetable$midyear_population, 
  bhd_central = exdat_lifetable$deaths
)

Main results

Total premature deaths attributable to exposure (sum of sex-specific impacts).

results_pm_deaths$health_main$impact

results_pm_deaths$health_main |> 
  dplyr::slice_head() |> 
  dplyr::select(impact_rounded, erf_ci, exp_ci, bhd_ci) |> 
  knitr::kable()

Detailed results

Attributable premature deaths results

per year (if argument approach_exposure = "constant")
per age (group)
per sex (if sex-specific life table data entered)

are available.

Note: we will inspect the results for females; male results are also available.

Note: because we set the function argument approach_exposure = "constant" in the function call results are available for one year (the year of analysis).

results_pm_deaths$health_detailed$intermediate_calculations|> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(projection_if_unexposed_by_age_and_year) |>
  purrr::pluck(1)

results_pm_yll$health_detailed$intermediate_calculations|> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(projection_if_unexposed_by_age_and_year) |>
  purrr::pluck(1) |>
  dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |>  
  knitr::kable()

rm(results_pm_deaths)

YLD

Goal

E.g., to quantify the years lived with disability (YLD) attributable to air pollution exposure using disability weights.

Methodology

To quantify the YLDs, you can use a prevalence-based or an incidence-based approach [@Kim2022_kspm].

Prevalence-based : Enter 1 (year) in the argument(s) dw_... and prevalence cases in bhd_....
Incidence-based: Enter a value above 1 in dw_... and incidence cases in bhd_....

Function call

results_pm_copd_yld  <- attribute_health(
  rr_central = 1.1, 
  rr_increment = 10, 
  erf_shape = "log_linear",  
  exp_central = 8.85,
  cutoff_central = 5,
  bhd_central = 1000,
  duration_central = 10,
  dw_central = 0.2
)

Main results

results_pm_copd_yld$health_main

results_pm_copd_yld$health_main |> 
  dplyr::select(erf_ci, impact) |> 
  knitr::kable()

DALY

Goal (e.g.)

E.g., to obtain the disability-adjusted life years (DALY) as the sum of YLLs and YLDs.

Methodology

To obtain the attributable DALY, its two components, i.e. years of life lost (YLL) and years lived with disability (YLD), must be summed [@GBD2020_tl].

$$DALY = YLL + YLD$$

Function call

This is possible using the function daly().

results_daly <- daly(
     output_attribute_yll = results_pm_yll,
     output_attribute_yld = results_pm_copd_yld
)

Main results

YLL, YLD & DALY

results_daly$health_main |> 
  select(impact_yll_rounded, impact_yld_rounded, impact_rounded)

results_daly$health_main |> 
  dplyr::select(impact_yll_rounded, impact_yld_rounded, impact_rounded) |> 
  knitr::kable()

rm(results_daly, results_pm_yll, results_pm_copd_yld)

Modification of scenarios

Goal

E.g., to quantify health impacts using attribute_healthin an scenario B very similar to a previous scenario A.

Function call

scenario_A <- attribute_health(
    exp_central = 8.85,   # EXPOSURE 1
    cutoff_central = 5, 
    bhd_central = 25000,
    approach_risk = "relative_risk",
    erf_shape = "log_linear",
    rr_central = 1.118,
    rr_increment = 10)

The function attribute_mod() can be used to modify one or multiple arguments of attribute_healthin an existing scenario, e.g. scenario_A.

scenario_B <- attribute_mod(
  output_attribute = scenario_A, 
  exp_central = 6
)

This is equivalent to building the whole scenario again (see below), but more time and code efficient.

scenario_B <- attribute_health(
    exp_central = 6,     # EXPOSURE 2
    cutoff_central = 5, 
    bhd_central = 25000,
    approach_risk = "relative_risk",
    erf_shape = "log_linear",
    rr_central = 1.118,
    rr_increment = 10)

Comparison of two health scenarios

Goal

E.g., to compare the health impacts in the scenario "before intervention" vs. "after intervention".

Methodology

Two approaches can be used for the comparison of scenarios:

Delta: Subtraction of health impact in scenario 1 minus in scenarios 2 (i.e. two PAF) [@WHO2014_book]
Population impact fraction (PIF) [@WHO2003_report; @Murray2003_spbm; @Askari2020_ijph].

Note that the PIF comparison approach assumes same baseline health data for scenario 1 and 2 (e.g., comparison of two scenarios at the same time point), while the delta comparison approach, the difference between two scenarios is obtained by subtraction. Therefore, the delta approach is suited for comparison of scenarios in different time points.

IMPORTANT If your aim is to quantify health impacts from a policy intervention, be aware that you should use the same year of analysis and therefore same health baseline data in both scenarios. The only variable that should change in the second scenario is the exposure (change as a result of the intervention).

Population Impact Fraction (PIF)

The Population Impact Fraction (PIF) is defined as the proportional change in disease or mortality when exposure to a risk factor is changed, for instance due to an intervention.

General Integral Form

The most general equation describing this mathematically is an integral form [@WHO2003_report; @Murray2003_spbm]:

$$PIF = \frac{\int rr_at_exp(x)PE(x)dx - \int rr_at_exp(x)PE'(x)dx}{\int rr_at_exp(x)PE(x)dx}$$

Where:

$x$ = exposure level
$PE(x)$ = population distribution of exposure
$PE'(x)$ = alternative population distribution of exposure
$rr_at_exp(x)$ = relative risk at exposure level compared to the reference level

Categorical Exposure Form

If the population exposure is described as a categorical rather than continuous exposure, the integrals may be converted to sums [@WHO2003_report; Murray2003-spbm]:

$$PIF = \frac{\sum rr_at_exp_{i} \times PE_{i} - \sum rr_at_exp_{i}PE'{i}}{\sum rr_at_exp{i}PE_{i}}$$

Where:

$i$ = the exposure category (e.g., in bins of 1 $\mu g/m^3$ $PM_{2.5}$ or 5 dB noise exposure)
$PE_i$ = fraction of population in exposure category $i$
$PE'_i$ = fraction of population in category $i$ for alternative (ideal) exposure scenario
$rr_at_exp_i$ = relative risk for exposure category level $i$ compared to the reference level

Population weighted mean concentration form

Finally, if the exposure is provided as the population weighted mean concentration, the equation for the PIF is reduced to:

$$PIF = \frac{rr_at_exp - rr_at_exp_{alt}}{rr}$$

Where:

$rr_at_exp$ = relative risk at the exposure level
$rr_at_exp_{alt}$ = relative risk at the exposure level for the alternative exposure scenario

Function call

Use attribute_health() to calculate burden of scenarios A & B.

scenario_A <- attribute_health(
    exp_central = 8.85,   # EXPOSURE 1
    cutoff_central = 5, 
    bhd_central = 25000,
    approach_risk = "relative_risk",
    erf_shape = "log_linear",
    rr_central = 1.118,
    rr_increment = 10)

scenario_B <- attribute_mod(
  output_attribute = scenario_A, 
  exp_central = 6
)

Use compare() to compare scenarios A & B.

results_comparison <- healthiar::compare(
  approach_comparison = "delta", # or "pif" (population impact fraction)
  output_attribute_scen_1 = scenario_A,
  output_attribute_scen_2 = scenario_B
)

The default value for the argument approach_comparison is "delta". The alterntive is "pif" (population impact fraction). See the function documentation of compare() for more details.

Main results

results_comparison$health_main

results_comparison[["health_main"]] |> 
  dplyr::select(
    impact, impact_rounded,
    impact_scen_1, impact_scen_2,
    bhd,
    dplyr::starts_with("exp_"),
    -dplyr::starts_with("exp_ci"), # remove col "exp_ci"
    dplyr::starts_with("rr_con")) |> 
  dplyr::slice_head() |> 
  knitr::kable()

Detailed results

The compare() results contain two additional outputs in addition to those we have already seen:

health_detailed
- scen_1 contains results of scenario 1 (scenario A in our case)
- scen_2 contains results of scenario 2 (scenario B in our case)

rm(results_comparison, scenario_A, scenario_B)

Two correlated exposures

Goal

E.g., to quantify the total health impact attributable to PM2.5 and NO2.

Methodology

A methodological report of the EU project BEST-COST [@Strak2024_report] identified three approaches to add up attributable health impacts from correlated exposures:

Additive approach [@Steenland2006-e]:

$$PAF_{additive} = PAF_{exposure1} + PAF_{exposure2}$$

Multiplicative approach [@Jerrett2013-oup]:

$$PAF_{multiplicative} = \frac{\sum PE \times (rr_at_exp_{multiplicative} - 1)}{\sum PE \times (rr_at_exp_{multiplicative}-1) + 1}$$

$$rr_at_exp_{multiplicative} = rr_at_exp_{exposure1} * rr_at_exp_{exposure2}$$

Combined approach [@Steenland2006-e]:

$$PAF_{combined} = 1-[(1-PAF_{exposure1}) \times (1-PAF_{exposure2})]$$

Attention: To apply any of these approaches, the relative risks for one exposure must be adjusted for the second exposure and the way round.

Function call

For this purpose, you can use the function multiexpose().

results_pm <- attribute_health(
  erf_shape = "log_linear",
  rr_central = 1.369, 
  rr_increment = 10,
  exp_central = 8.85,
  cutoff_central = 5,
  bhd_central = 30747
) 

results_no2 <- attribute_mod(
  output_attribute = results_pm,
  exp_central = 10.9,
  rr_central = 1.031
)

results_multiplicative <- multiexpose(
  output_attribute_exp_1 = results_pm,
  output_attribute_exp_2 = results_no2,
  exp_name_1 = "pm2.5",
  exp_name_2 = "no2",
  approach_multiexposure = "multiplicative"
)

rm(results_no2, results_pm)

Main results

results_multiplicative$health_main

results_multiplicative$health_main |> 
  dplyr::select(impact_rounded) |> 
  knitr::kable()

rm(results_multiplicative)

Standardization

Goal

E.g., to obtain the age-standardized attributable health impacts of two age groups.

Methodology

Age standardization involves adjusting the observed rates of a particular outcome to a standard population with a specific age structure. This is a technique used to allow the comparison of populations with different age structures [@GBD2020_tldemo; @Ahmad2001_report]. In healthiar, the function standardize() applies the direct method, where the age-specific rates observed in a study population are applied to a standard (reference) population distribution.

The standardized health impact rate is computed as $$ impact_per_100k_inhab_{std} = \sum_{i=1}^{k} (impact_per_100k_inhab_i \times ref_prop_pop_i) $$

where:

$impact_per_100k_inhab_{std}$ is the age-standardized health impact rate.
$impact_per_100k_inhab_i$ is the impact rate observed in age group $i$ (e.g., impact per 100,000 inhabitants).
$ref_prop_pop_i$ is the proportion of the reference population in age group $i$ .
$k$ is the number of age groups.

Function call

output_attribute <- attribute_health(
  rr_central = 1.063,
  rr_increment = 10,
  erf_shape = "log_linear",
  cutoff_central =  0,
  age_group = c("below_40", "above_40"),
  exp_central = c(8.1, 10.9),
  bhd_central = c(1000, 4000),
  population = c(100000, 500000)
  )

results <- standardize(
  output_attribute = output_attribute,
  age_group = c("below_40", "above_40"),
  ref_prop_pop = c(0.5, 0.5)
  )

Main results

Age-standardized impact rate:

print(results$health_main$impact_per_100k_inhab)

Age group-specific impact rate:

print(results$health_detailed$results_raw$impact_per_100k_inhab)

rm(results)

Preparation of exposure data

Goal

E.g., to determine population-weighted mean PM2.5 exposure for several neighborhoods of Brussels (Belgium)

Methodology

The healthiarfunction prepare_exposure() helps users that do not have the exposure data (needed for healthiar functions), but only spatial concentration and population data. The function calculates an average concentration value in each geographic unit, weighted with population at each location.

$$ exp = \frac{\sum_{i=1}^{n} (C_i \times population_i)}{\sum_{i=1}^{n} population_i} $$

where:

$exp$ = population-weighted mean exposure for the geographic unit.
$C_i$ = pollutant concentration in grid cell $i$.
$population_i$ = population count in grid cell $i$.
$n$ = total number of grid cells contained by the geographic unit.

In case population is entered as count by geographic sub-unit, the function calculates the mean concentration in each sub-unit and aggregates it to higher-level geographic units. If no population data is entered, the function calculates a simple spatial mean concentration as exposure value.

The output of prepare_exposure() can be entered in the argument exp_mean, exp_lower and/or exp_upper in healthiar functions such as attribute_health().

Function call

# exdat_pwm_1 = Pollution grid data
exdat_pwm_1 <- terra::rast(system.file("extdata", "exdat_pwm_1.tif", package = "healthiar"))

# exdat_pwm_2 = Data with the geo units and population data. This is pre-loaded in healthiar.
# If your raw data are in .gpkg format, you can use e.g.  sf::st_read() 

pwm <- healthiar::prepare_exposure(
  poll_grid = exdat_pwm_1, # Formal class SpatRaster,
  geo_units = exdat_pwm_2, # sf of the geographic sub-units
  population = sf::st_drop_geometry(exdat_pwm_2$population), # population per geographic sub-unit
  geo_id_macro = sf::st_drop_geometry(exdat_pwm_2$region)) # higher-level IDs to aggregate

Main results

Within the function output, the list main contains the population-weighted mean exposures for the (higher-level) geographic units in the column exposure_mean and the total population in each unit in column population_total.

 knitr::kable(pwm$main)

rm(exdat_pwm_1)
rm(exdat_pwm_2)
rm(pwm)

Threshold additional to cut-off

Goal

E.g., to quantify health impacts in the exposure group 55dB+ (calculation threshold) that are affected by a exposure above the effect threshold of 45 dB (cut-off).

Function call

The function arguments erf_eq_... require a function as input. Instead of using a splinefun() this can also be fulfilled by using a 'function(c)' which is of type 'function'.

#setting up function parameters
threshold_effect <- 45
RR <- 1.055
threshold_calculation <- 55
rr_increment <- 10

# define categorical function, the ifelse condition enables the case distinction
erf_function <- function(c){
  output <- ifelse(c<threshold_calculation, 1, exp((log(RR)/rr_increment)*(c-threshold_effect)))
  return(output)
}
# attribute_health
results_catERF_different_calc_thesh <- healthiar::attribute_health(
  approach_risk = "relative_risk",
  erf_eq_central = erf_function,
  prop_pop_exp = c(300000,200000,150000,120000,100000,70000,60000)/10000000,
  exp_central = c(47,52,57,62,67,72,77),
  cutoff_central=0,
  bhd_central=50000)$health_main$impact_rounded

The used function is equal to $$ f(c) = \begin{cases} 1, & c < \text{threshold} \ \exp\left( \frac{\log(RR)}{rr_{increment}} (c - threshold_{effect}) \right), & c \ge \text{threshold} \end{cases} $$

The categorical ERF curve created looks as follows.

# Evaluate exp_central points
x <- c(47,52,57,62,67,72,77) # central exposure values
y <- erf_function(x)

# Generate finer x-values
x_dense <- seq(min(x), max(x), length.out = 200)
# Evaluate exp_central points (finer version)
y_dense <- erf_function(x_dense)

## Plot
plot(x,
     y # main = "User-defined ERF"
     ,
     main = "",
     xlab = "Exposure",
     ylab = "Relative risk",
     col = "blue",
     pch = 19)

lines(x_dense,y_dense,col = "red",lwd = 2)

legend("topleft",
       legend = c("Original points", "Categorical ERF curve"),
       col = c("blue", "red"),
       pch = c(19, NA),
       lty = c(NA, 1),
       lwd = c(NA, 2))

Economic dimension

Monetization

Goal

E.g., to monetize the attributable health impact of a policy that will have health benefits five years from now.

Methodology

In health economic evaluations and economic burden of disease assessments, health impacts may need to be converted into monetary values. For this purpose, you can use monetize().

Several valuation metrics are available, depending on how outcomes are quantified in natural or health units (e.g. cases reduced, deaths prevented, reductions in mortality risk, life-years gained, QALYs gained, DALYs averted). Common metrics include the Value of a Statistical Life (VSL) [@OECD2025_book] , the Value of a Life-Year (VOLY) [@Hammitt2007_reep] and the Value of a Quality-Adjusted Life-Year (VAQALY) [@Bobinac2010_vh].

Discounting is the practice of converting future costs (or health impacts as previous step to valuating them) into their present value. The underlying rationale is that the value placed on outcomes declines as they occur further in the future. Therefore, future costs and effects are converted into present-value terms to make them comparable over time [@Attema2018_p].

Discounting is implemented by selecting a discount rate, which is used to compute a discount factor for each time period. This factor is then multiplied by the corresponding future cost (or effect) to express it in present-value terms.

If you need the discounted values of a cost or health outcome, you can call the healthiar function discount(). If you just need the discount factor, you can alternatively call get_discount_factor() (entering is_deflation = TRUE). If you just need the inflation factor, you can get_inflation_factor().

Different functional forms can be used to apply discounting. The most common is exponential discounting, also referred to as constant discounting, since outcomes are discounted proportionally as time increases. An alternative is hyperbolic discounting, which tends to better capture human behavior by discounting the near present more heavily than outcomes further in the future [@Lipman2024_jru]

See below the equations that are used behind these functions.

$$monetized_impact = impact \times valuation \times discount_factor \times deflator_factor \times real_growth_factor$$ The arguments discount_factor, deflator_factor and real_growth_factor are only used if a value is entered in the arguments discount_rate, ingflation_rate and real_growth_rate respectively
(otherwise ignored).

Discount factor

Exponential discounting

As suggested by @Frederick2002_jel

$$discount_factor = \frac{1}{(1 + discount_rate)^{n_years}}$$

Hyperbolic discounting Harvey

As suggested by @Harvey1986_ms

$$discount_factor = \frac{1}{(1 + n_years)^{discount_rate}}$$

Hyperbolic discounting Mazur

As suggested by @Mazur1987_book

$$discount_factor = \frac{1}{1 + (discount_rate \times n_years)}$$

Deflation factor

Inflation can be handled in monetize() by applying a deflator on future values, projected in nominal terms, in order to convert them into real values (i.e., express these values in terms of constant prices of a single base year). Therefore, if the user of the function provides a value for the inflation_rate argument, a deflator factor [@HMTreasury2022_greenbook; @Brealey2023_book; @Samuelson1937_res] is applied according to the following formulas

$$deflator_factor = \frac{1}{(1 + inflation_rate)^{n_years}}$$

Real valuation growth

If a rising societal value of health over time is required in the monetization, unlike inflation, this represents a "real" increase in value. Thus, as societies become wealthier, their willingness to pay to avoid mortality and morbidity risks tends to rise [@OECD2012_book].

For this use case, you have enter a value in the argument real_growth_rate in monetize(), which allows you to project this growth by applying a valuation growth factor to base-year unit values:

$$real_growth_factor =(1 + real_growth_rate)^{n_years}$$

Where $real_growth_rate$ represents the annual real growth rate in health valuation. This ensures that long-term environmental impacts are not undervalued.

Function call

monetized_pm_copd <- monetize(
    output_attribute = results_pm_copd,
    discount_shape = "exponential",
    discount_rate = 0.03,
    n_years = 5,
    valuation = 50000 # E.g. EURO
)

Main results

The outcome of the monetization is added to the variable entered to the output_attribute argument, which is results_pm_copd in our case.

Two folders are added:

monetization_main contains the central monetization estimate and the corresponding 95% confidence intervals obtained through the specified monetization.
monetization_detailed contains the monetized results for each unique combination of the input variable estimates that were provided to the initial attribute_health() call.

monetized_pm_copd$monetization_main

monetized_pm_copd$monetization_main |> 
  select(erf_ci, monetized_impact) |> 
  knitr::kable()

We see that the monetized impact (discounted) is more than 160 million EURO.

Alternatively, you can also monetize (attributable) health impacts from a non-healthiar source.

results <- monetize(
  impact = 1151,
  valuation = 100
)

Cost-benefit analysis

Goal (e.g.)

E.g., to perform an economic evaluation for an intervention by comparing its benefits and costs via Cost-Benefit Analysis (CBA).

Methodology

The CBA is a type of economic evaluation that compares the costs and the benefits of an intervention, considering both measures expressed in monetary terms.

To perform a CBA, you can use the function cba(). This approach requires monetizing benefits so they can be directly compared with costs. Since interventions typically generate costs and benefits over multi-year time horizons, discounting is a common practice to obtain the present value of future costs and benefits. Depending on the reference guidelines, the discount rate can be specified as the same for costs and benefits or different across them. The outputs of a Cost-Benefit Analysis can be expressed as three main indicators [@Boardman2018_book]: - intervention’s net benefit: the difference between monetized benefits and costs - Cost-Benefit Ratio (CBR): monetized benefits divided by costs and - Return on Investment (ROI): return generated per unit of expenditure by relating net benefits to the intervention’s costs.

An intervention is recommended from a Cost-Benefit Analysis perspective, if it yields a positive net benefit or a positive ROI, or equivalently, a CBR greater than one, meaning that the intervention’s monetized benefits exceed its costs. These three outputs are available when running cba() and are calculated considering the following formulas.

Net Benefit $$net_benefit = benefit - cost$$

Cost-Benefit Ratio (CBR) $$cbr = \frac{benefit}{cost}$$

Return on Investment (ROI) $$roi = \frac{benefit - cost}{cost} \times 100$$

Function call

Let's imagine we design a policy that would reduce air pollution to 5 $\mu g/m^3$, which is the concentration specified in the cutoff_central argument in the initial attribute_health() call. So we could avoid all COPD cases attributed to air pollution.

Considering the cost to implement the policy (estimated at 100 million EURO), what would be the monetary net benefit of such a policy? We can find out using the functions healthiar and cba()

cba <- cba(
    output_attribute = results_pm_copd,
    valuation = 50000,
    cost = 100000000,
    discount_shape = "exponential",
    discount_rate_benefit = 0.03,
    discount_rate_cost = 0.03,
    n_years_benefit = 5,
    n_years_cost = 5
)

Main results

The outcome of the CBA is contained in two folders, which are added to the existing assessment:

cba_main contains the central estimate and the corresponding 95% confidence intervals obtained
cba_detailed contains additional intermediate results for both cost and benefit
- benefit contains results by_year and raw results health_raw
- cost contains the costs of the policy at the end of the period specified in the n_years_cost argument

cba$cba_main |>  
  dplyr::select(benefit, cost, net_benefit) |> 
  knitr::kable()

rm(cba)

We see that the central and upper 95% confidence interval estimates of avoided attributable COPD cases result in a net monetary benefit of the policy, while the lower 95% confidence interval estimate results in a net cost!

Social aspects

Health impact attributable to social indicator

Goal

E.g., to estimate the health impact that is theoretically attributable to the difference in degree of deprivation of the population exposed.

Methodology

Taking into account socio-economic indicators, e.g. a multiple deprivation index [@Mogin2025_ejph], the differences in attributable health impacts across the study areas can be estimated [@Renard2019_bmc; @Otavova_2022_bmc].

Social inequalities are quantified as the difference between the least deprived areas (the last n-quantile) and

the most deprived areas or
the population overall.

These differences can be

absolute or
relative.

Difference most deprived vs. least deprived

$$ absolute_quantile = first - last $$ Where:

$absolute_quantile$ = Absolute difference between quantiles.
$first$ = Average health impacts in most deprived quantile.
$last$ = Average health impacts in least deprived quantile.

$$ relative_quantile = \frac{absolute_quantile}{last} $$

Difference overall vs. least deprived

$$ absolute_overall = overall - last $$ Where:

$absolute_overall$ = Absolute difference regarding the overall average.
$overall$ = Overall average health impacts in the study area.
$last$ = Average health impacts in least deprived quantile.

If you assume that the least deprived areas are similar to counter-factual cases (no exposure to deprivation), the relative difference regarding the overall average health impact could be interpreted as some kind of relative risk attributable to social inequalities.

Function call

First, quantify health impacts.

 health_impact <- healthiar::attribute_health(
   age_group = exdat_socialize$age_group,
   exp_central = exdat_socialize$pm25_mean,
   cutoff_central = 0,
   rr_central = exdat_socialize$rr,
   erf_shape = "log_linear",
   rr_increment = 10,
   bhd_central = exdat_socialize$mortality,
   population = exdat_socialize$population,
   geo_id_micro = exdat_socialize$geo_unit)

Second, use the function socialize() entering the whole output of attribute_health() in the argument output_attribute.

social_t <- healthiar::socialize(
  output_attribute = health_impact,
  age_group = exdat_socialize$age_group, # They have to be the same in socialize() and in attribute_health()
  ref_prop_pop = exdat_socialize$ref_prop_pop, # Population already provided in output_attribute
  geo_id_micro = exdat_socialize$geo_unit,
  social_indicator = exdat_socialize$score,
  n_quantile = 10,
  increasing_deprivation = TRUE)

Alternatively, you can directly enter the health impact in the socialize() argument impact.

social <- healthiar::socialize(
  impact = health_impact$health_detailed$results_by_age_group$impact,
  age_group = exdat_socialize$age_group, # They have to be the same in socialize() and in attribute_health()
  ref_prop_pop = exdat_socialize$ref_prop_pop,
  geo_id_micro = exdat_socialize$geo_unit,
  social_indicator = exdat_socialize$score,
  population = exdat_socialize$population, # Population has to be provided because no output_attribute
  n_quantile = 10,
  increasing_deprivation = TRUE)

Main results

social$social_main |> 
  dplyr::select(c(parameter, difference_type, 
                  difference_compared_with, difference_value, 
                  comment)) |>
  print()

Multiple deprivation index

Goal

E.g., to estimate the multiple deprivation index (MDI) to use it for the argument social_indicator in the function socialize().

Methodology

Socio-economic indicators (e.g., education level, employment status and family structure) can be condensed into a multiple deprivation index (MDI) [@Mogin2025_ejph]. For this purpose, the indicators can be normalized using min-max scaling.

The reliability of the MDI can be assessed using Cronbach's alpha [@Cronbach1951_p].

$$ \alpha = \frac{k}{k - 1} \left( 1 - \frac{\sum_{i=1}^{k} \sigma^2_{y_i}}{\sigma^2_x} \right) $$ where:

$k$ is the number of items/variables.
$\sigma^2_{y_i}$ is the variance of the $i$-th item.
$\sum_{i=1}^{k} \sigma^2_{y_i}$ is the sum of the variances of all items.
$\sigma^2_x$ is the total variance of the observed total scores (the sum of all items).

To apply this approach, you should ensure that the data set is as complete as possible. Otherwise, you can try to impute missing data using: - Time-Based Imputation: Linear regression based on historical trends if prior years' data is complete. - Indicator-Based Imputation: Multiple linear regression if the missing indicator correlates strongly with others.

Imputation models should have an R^2 greater than or equal to 0.7. If R^2 lower than 0.7, consider alternative data sources or methods.

Function call

mdi <- prepare_mdi(
  geo_id_micro = exdat_prepare_mdi$id,
  edu = exdat_prepare_mdi$edu,
  unemployed = exdat_prepare_mdi$unemployed,
  single_parent = exdat_prepare_mdi$single_parent,
  pop_change = exdat_prepare_mdi$pop_change,
  no_heating = exdat_prepare_mdi$no_heating,
  n_quantile = 10,
  verbose = FALSE
)

Note: verbose = FALSE suppresses any output to the console (default: verbose = TRUE, i.e. with printing turned on).

Main results

Function output includes:

mdi_main, a tibble containing the BEST-COST MDI

mdi$mdi_main |> 
  select(geo_id_micro, MDI, MDI_index)

mdi$mdi_main |> 
  dplyr::select(geo_id_micro, MDI, MDI_index) |> 
  knitr::kable()

The function assesses the reliability of the MDI based on the Cronbach's alpha value as follows: - 0.9 and higher: Excellent reliability - between 0.8 (included) and 0.9: Good reliability - between 0.7 (included) and 0.8: Acceptable reliability - between 0.6 (included) and 0.7: Questionable reliability - lower than 0.6: Poor reliability

Detailed results

mdi_detailed
- DESCRIPTIVE STATISTICS
- PEARSON'S CORRELATION COEFFICIENTS
- CRONBACH'S α, including the reliability rating this value indicates
- Code for boxplots of the single indicators
- Code for histogram of the MDI's for the geo units with a normal distribution curve

To reproduce the boxlots run

eval(mdi$mdi_detailed$boxplot)

Analogeously, to reproduce the histogram run

eval(mdi$mdi_detailed$histogram)

rm(mdi)

Inside pipes

Pipe |>

healthiar can be used inside the native pipes |>. See the example below.

exdat_noise |>
  (\(df) {
    healthiar::attribute_health(
      approach_risk = df$risk_estimate_type,
      exp_central = df$exposure_mean,
      pop_exp = df$exposed,
      erf_eq_central = df$erf
      )$health_main$impact_rounded
    })()

Shorter making used of the base R function with().

exdat_noise |>
      (\(df) {
        with(df, healthiar::attribute_health(
         approach_risk = risk_estimate_type,
         exp_central = exposure_mean,
         pop_exp = exposed,
         erf_eq_central = erf
         ))$health_main$impact_rounded
        })()

Pipe %>%

healthiar can also be used inside magrittr pipes %>% as follows.

exdat_noise %>%
  {
    healthiar::attribute_health(
      approach_risk = .$risk_estimate_type,
      exp_central = .$exposure_mean,
      pop_exp = .$exposed,
      erf_eq_central = .$erf
    )$health_main$impact_rounded
  }

Export and visualize

Exporting and visualizing results is out of scope of healthiar. To export and visualize, you can make use of existing functions in other packages beyond healthiar as indicated below.

Export results

Export as .csv file

write.csv(x = results_pm_copd$health_main, file = "exported_results/results_pm_copd.csv")

Save as .Rdata file

save(results_pm_copd, file = "exported_results/results_pm_copd.Rdata")

Export to Excel (as .xlsx file)

openxlsx::write.xlsx(x = results_pm_copd$health_main, file = "exported_results/results_pm_copd.xlsx")

Visualize results

Visualization is out of scope of healthiar. You can visualize in:

R using base programming or packages such as ggplot2 [@Wickham2016_ggplot],
Excel (export results first) or
Other tools.

Abbreviations

BHD/bhd = baseline health data

CI = confidence interval

CBA/cba = cost-benefit analysis

exp = exposure

ERF = exposure-response function

RR/rr = relative risk

WHO = World Health Organization

YLL/yll = years of life lost

References

Any scripts or data that you put into this service are public.

healthiar documentation built on March 12, 2026, 5:07 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

healthiar Quantifying and Monetizing Health Impacts Attributable to Exposure

Vignette: Intro to healthiar" In healthiar: Quantifying and Monetizing Health Impacts Attributable to Exposure

About healthiar

Input & output data

Input

Hard coded vs. columns

Hard coded

Columns

Tidy data

Output

Structure

Access

Function examples

Relative risk

Goal

Methodology

Population attributable fraction

Simplified for categorical exposure distribution

Simplified for single exposure value

Scaling relative risk

Function call

Main results

Absolute risk

Goal

Methodology

Function call

Main results

Results per noise exposure band

One exposure category

Multiple geographic units

using relative risk

Goal

Function call

Main results

Detailed results

using absolute risk

Goal

Function call

Main results

Detailed results

Uncertainty

Confidence interval

Goal

Function call

Detailed results

Monte Carlo simulation

Goal

Methodology

General concepts

Distributions used for simulation

Function call

Main results

Detailed results

User-defined ERF

Goal

Function call

Sub-group analysis

by age group

Goal

Function call

Results by age group

by sex

Goal

Function call

Results by sex

by other sub-groups

Goal

Function call

Results by other sub-group

by age, sex and other sub-groups

Goal

Function call

Results by all sub-groups

YLL & deaths with life table

YLL

Goal

Methodology

Data preparation

General concept

Determination of populations in the (first) year of analysis

healthiar
Quantifying and Monetizing Health Impacts Attributable to Exposure

Vignette: Intro to healthiar"
In healthiar: Quantifying and Monetizing Health Impacts Attributable to Exposure

About `healthiar`