knitr::opts_chunk$set(echo=TRUE, eval=TRUE, include=TRUE)
knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(rmarkdown.html_vignette.check_title = FALSE) options(knitr.kable.max_rows = 10) set.seed(1) ## Avoid using pacman here, as it causes error in installation if it's not installed already library(healthiar) library(tibble) library(dplyr) library(purrr) library(tidyr) library(knitr) library(terra) library(sf)
Hi there!
This vignette will tell you about healthiar and show you how to use healthiar with the help of examples.
NOTE: Before using healthiar, please read carefully the information provided in the readme file or the welcome webpage. By using healthiar, you agree to the terms of use and disclaimer.
healthiarThe healthiar functions allow you to quantify and monetize the health impacts (or burden of disease) attributable to exposure. The main focus of the EU project that initiated the development of healthiar (BEST-COST) has been two environmental exposures: air pollution and noise. However, healthiar could be used for other exposures such as green spaces, chemicals, physical activity...
See below a an overview of the healthiar, which is the first page of the cheat sheet. The whole list of functions included in healthiar is linked there and available in the reference.

You can enter data in healthiar functions using:
- hard coded values or
- columns inside pre-loaded data frames or tibbles.
Let's see some examples calling the most important function in healthiar: attribute_health().
Depending on the function argument, you will need to enter numeric or character values.
results_pm_copd <- attribute_health( exp_central = 8.85, rr_central = 1.369, rr_increment = 10, erf_shape = "log_linear", cutoff_central = 5, bhd_central = 30747 )
healthiar comes with some example data that start with exdat_ that allow you to test functions. Some of these example data will be used in some examples in this vignette.
Now let's attribute_health() with input data from the healthiar example data. Note that you can easily provide input data to the function argument using the $ operator.
results_pm_copd <- attribute_health( erf_shape = "log_linear", rr_central = exdat_pm$relative_risk, rr_increment = 10, exp_central = exdat_pm$mean_concentration, cutoff_central = exdat_pm$cut_off_value, bhd_central = exdat_pm$incidence )
Be aware that healthiar functions are easier to use if your data is prepared in a tidy format, i.e.:
Each variable is a column; each column is a variable.
Each observation is a row; each row is an observation.
Each value is a cell; each cell is a single value.
To know more about the concept of tidy format, see the article by [@Wickham2014_jss].
For example, in attribute health() the length of the input vectors to be entered in the arguments must be either 1 or the result of the combinations of the different values of:
geo_id_micro
exp_...
sex
age
(info for further sub-group analysis)
The output of the healthiarfunction attribute_health() and attribute_lifetable consists of two lists ("folders"):
health_main contains the main results
health_detailed contained detailed results and additional info about the assessment.
In other healthiar functions you can find a similar output structure but using different prefixes. E.g., social_in socialize() and monetization_in monetitize().
A similar structure can be found in other large functions in helathiar, e.g., attribute_lifetable(), compare(), socialize() or monetize(). In some functions, different elements are available in the output. For instance, attribute_lifetable() creates additional output that is specific to life table calculations.
There exist different, equivalent ways of accessing the output:
With $ operator: results_pm_copd$health_main$impact_rounded (as in the example above)
By mouse: go to the Environment tab in RStudio and click on the variable you want to inspect, and then open the health_main results table
With [[]] operator results_pm_copd[["health_main"]]
With pluck() & pull(): use the purrr::pluck function to select a list and then the dplyr::pull function extract values from a specified column, e.g. results_pm_copd |> purrr::pluck("health_main") |> dplyr::pull("impact_rounded")
The descriptions of the healthiar functions provide examples that you can execute (with healthiar loaded) by running example("function_name"), e.g. example("attribute_health"). In the sections below in this vignette, you find additional examples and more detailed explanations.
E.g., to quantify the COPD cases attributable to PM2.5 (air pollution) exposure in a country.
The comparative risk assessment approach [@Murray2003_e] is applied obtaining the population attributable fraction (percent of cases that are attributable to the exposure) based on the relative risk. The exposure scenario is compared with a counter-factual scenario.
This approach has been extensive documented and applied [e.g., @WHO2003_report; @Steenland2006-e; @Soares2020_report; @Pozzer2023_gh; @GBD2020_tl; @Lehtomaki_2025_eh].
{width="700"}
General integral form for the population attributable fraction (PAF):
$$PAF = \frac{\int rr_at_exp(x) \times PE(x)dx - 1}{\int rr_at_exp(x) \times pop_exp(x)dx}$$
Where:
$x$ = exposure level
$PE(x)$ = population distribution of exposure
$rr_at\exp(x)$ = relative risk at exposure level compared to reference
If exposure is categorical, the integrals are converted to sums:
$$PAF = \frac{\sum rr_at_exp_i \times PE_i - 1}{\sum rr_at_exp_i \times PE_i}$$
Alternatively, an equivalent form is:
$$PAF = \frac{\sum PE_i \times (rr_at_exp_i - 1)}{\sum PE_i\times (rr_at_exp_i - 1) + 1}$$
If there is one single single exposure value, corresponding to the population weighted mean concentration, the equation can be simplified as follows:
$$PAF = \frac{rr_at_exp - 1}{rr_at_exp }$$
How to get this relative risk at exposure level (rr_at_exp)? This is normally different to the relative risk published in the epidemiological literature (rr) together with the (concentration/dose) increment that corresponds to this relative risk. The equations used for scaling relative risk depend on the chosen exposure-response function shapes:
linear [Lehtomaki_2025_eh] $$RRexp = 1 + \frac{rr - 1}{increment} \times (exp - cutoff)$$
log-linear [@Lehtomaki_2025_eh] $$RRexp = e^{\frac{\log(rr)}{increment} \times (exp - cutoff)}$$
log-log [@Lehtomaki_2025_eh] $$RRexp = \left( \frac{exp + 1}{cutoff + 1} \right)^{\frac{\log(rr)}{\log(increment + cutoff + 1) - \log(cutoff + 1)}}$$
linear-log [@Pozzer2023_gh] $$RRexp = 1 + \frac{\log(rr - 1)}{\log(increment + cutoff + 1) - \log(cutoff + 1)} \times \frac{\log(exp + 1)}{\log(cutoff + 1)}$$
The relative risk at exposure level (rr_at_exp) and is part of the output of attribute_health() and attribute_lifetable(). rr_at_exp can also be calculated using get_risk().
For conversion of hazard ratios and/or odds ratios to relative risks refer to [@VanderWeele2019_biom] and/or use the conversion tools developed by the Teaching group in EBM in 2022 for hazard ratios (https://ebm-helper.cn/en/Conv/HR_RR.html) and/or odds ratios (https://ebm-helper.cn/en/Conv/OR_RR.html).
results_pm_copd <- attribute_health( approach_risk = "relative_risk", # If you do not call this argument, "relative_risk" will be assigned by default. erf_shape = "log_linear", rr_central = exdat_pm$relative_risk, rr_increment = 10, exp_central = exdat_pm$mean_concentration, cutoff_central = exdat_pm$cut_off_value, bhd_central = exdat_pm$incidence )
results_pm_copd$health_main
It is a table of the format tibble of 3 rows and 23 columns. Be aware that this main output contains input data, some intermediate steps and the final results in different formats.
Let's zoom in on some relevant aspects.
results_pm_copd$health_main |> dplyr::select(exp, bhd, rr, erf_ci, pop_fraction, impact_rounded) |> knitr::kable() # For formatting reasons only: prints tibble in nice layout
Interpretation: this table shows us that exposure was 8.85 $\mu g/m^3$, the baseline health data (bhd_central) was 30747 (COPD incidence in this instance). The 1st row further shows that the impact attributable to this exposure using the central relative risk (rr_central) estimate of 1.369 is 3502 COPD cases, or \~11% of all baseline cases.
Some of the most results columns include:
E.g., to quantify the number incidence cases of high annoyance attributable to (road traffic) noise exposure.
In the absolute risk calculation pathway, estimates are based on the size and distribution of the exposed population, rather than on baseline health data, as is the case in the relative risk pathway [@WHO2011_report].
{width="700"}
$$N = \sum AR_i \times PE_i$$
Where:
$N$ = attributed cases
$AR_i$ = absolute risk at category $i$
$PE_i$ = absolute population exposed at category $i$
results_noise_ha <- attribute_health( approach_risk = "absolute_risk", # default is "relative_risk" exp_central = c(57.5, 62.5, 67.5, 72.5, 77.5), # mean of the exposure categories pop_exp = c(387500, 286000, 191800, 72200, 7700), # population exposed per exposure category erf_eq_central = "78.9270-3.1162*c+0.0342*c^2" # exposure-response function )
The erf_eq_central argument can digest other types of functions (see section on user-defined ERF).
results_noise_ha |> purrr::pluck("health_main") |> dplyr::select(erf_eq, erf_ci, impact_rounded) |> knitr::kable() # Prints tibble in a minimal layout
results_noise_ha$health_detailed$results_raw
results_noise_ha[["health_detailed"]][["results_raw"]] |> select(exp_category, exp, pop_exp, impact) |> knitr::kable()
Remember, that if the equation of the exposure-response function (erf_eq_...)
requires taking a maximum in a vectorised context, pmax() must be used instead of max(). pmax()
should be used whenever an element-wise maximum is required (the output will be a vector),
while max() returns a single global maximum for the entire vector. For example:
erf_eq_central <- "exp(0.2969*log((pmax(0,c-2.4)/1.9+1))/(1+exp(-(pmax(0,c-2.4)-12)/40.2)))"
Alternatively, it's also possible to only assess the absolute risk impacts for one exposure category (e.g., a single noise exposure band).
results_noise_ha <- attribute_health( approach_risk = "absolute_risk", exp_central = 57.5, pop_exp = 387500, erf_eq_central = "78.9270-3.1162*c+0.0342*c^2" )
results_noise_ha$health_main
results_noise_ha[["health_main"]] |> dplyr::select(exp_category, impact) |> knitr::kable()
rm(results_noise_ha)
E.g., to quantify the disease cases attributable to PM2.5 exposure in multiple cities using one single command.
Enter unique ID's as a vector (numeric or character) to the geo_id_micro argument (e.g., municipality names or province abbreviations)
Optional: aggregate unit-specific results by providing higher-level ID's (e.g., region names or country abbreviations) as a vector (numeric or character) to the geo_id_macro argument
Input to the other function arguments is specified as usual, either as a vector or a single values (which will be recycled to match the length of the other input vectors).
results_iteration <- attribute_health( # Names of Swiss cantons geo_id_micro = c("Zurich", "Basel", "Geneva", "Ticino", "Jura"), # Names of languages spoken in the selected Swiss cantons geo_id_macro = c("German","German","French","Italian","French"), rr_central = 1.369, rr_increment = 10, cutoff_central = 5, erf_shape = "log_linear", exp_central = c(11, 11, 10, 8, 7), bhd_central = c(4000, 2500, 3000, 1500, 500) )
In this example we want to aggregate the lower-level geographic units (municipalities) by the higher-level language region ("German", "French", "Italian").
The main output contains aggregated results
results_iteration$health_main
results_iteration[["health_main"]] |> dplyr::select(geo_id_macro, impact_rounded, erf_ci, exp_ci, bhd_ci) |> knitr::kable()
In this case health_main contains the cumulative / summed number of stroke cases attributable to PM2.5 exposure in the 5 geo units, which is r sum(results_iteration[["health_main"]]$impact_rounded) (using a relative risk of 1.369).
The geo unit-specific information and results are stored under health_detailed>results_raw .
results_iteration$health_detailed$results_raw
results_iteration[["health_detailed"]][["results_raw"]] |> dplyr::select(geo_id_micro, impact_rounded, geo_id_macro) |> knitr::kable()
health_detailed also contains impacts obtained through all combinations of input data central, lower and upper estimates (as usual), besides the results per geo unit (not shown above).
rm(results_iteration)
E.g., to quantify high annoyance cases attributable to noise exposure in rural and urban areas.
data <- exdat_noise |> ## Filter for urban and rural regions dplyr::filter(region == "urban" | region == "rural")
results_iteration_ar <- attribute_health( # Both the rural and urban areas belong to the higher-level "total" region geo_id_macro = "total", geo_id_micro = data$region, approach_risk = "absolute_risk", exp_central = data$exposure_mean, pop_exp = data$exposed, erf_eq_central = "78.9270-3.1162*c+0.0342*c^2" )
NOTE: the length of the input vectors fed to geo_id_micro, exp_central, pop_exp must match and must be
(number of geo units) x (number of exposure categories) = 2 x 5 = 10,
because we have 2 geo units ("rural" and "urban") and 5 exposure categories.
health_main contains the aggregated results (i.e. sum of impacts in rural and urban areas).
results_iteration_ar$health_main
results_iteration_ar[["health_main"]] |> dplyr::select(geo_id_macro, impact_rounded, erf_ci, exp_ci) |> knitr::kable()
Impact by geo unit, in this case impact in the rural and in the urban area.
results_iteration_ar$health_detailed$results_by_geo_id_micro
results_iteration_ar[["health_detailed"]][["results_by_geo_id_micro"]] |> dplyr::select(geo_id_micro, geo_id_macro, impact) |> knitr::kable()
rm(results_iteration_ar, data)
E.g., to quantify the COPD cases attributable to PM2.5 exposure taking into account uncertainty (lower and upper bound of confidence interval) in several input arguments: relative risk, exposure and baseline health data.
results_pm_copd <- attribute_health( erf_shape = "log_linear", rr_central = 1.369, rr_lower = 1.124, # lower 95% confidence interval (CI) bound of RR rr_upper = 1.664, # upper 95% CI bound of RR rr_increment = 10, exp_central = 8.85, exp_lower = 8, # lower 95% CI bound of exposure exp_upper = 10, # upper 95% CI bound of exposure cutoff_central = 5, bhd_central = 30747, bhd_lower = 28000, # lower 95% confidence interval estimate of BHD bhd_upper = 32000 # upper 95% confidence interval estimate of BHD )
Let's inspect the detailed results:
results_pm_copd$health_detailed$results_raw
results_pm_copd$health_detailed$results_raw |> dplyr::select(erf_ci, exp_ci, bhd_ci, impact_rounded) |> dplyr::slice(1:9) |> knitr::kable() # Prints tibble in a minimal layout
Each row contains the estimated attributable cases (impact_rounded) obtained by the input data specified in the columns ending in "_ci" and the other calculation pathway specifications in that row (not shown).
The 1st contains the estimated attributable impact when using the central estimates of relative risk, exposure and baseline health data.
The 2nd row shows the impact when using the central estimates of the relative risk, exposure in combination with the lower estimate of the baseline health data.
...
NOTE: only r length(1:9) of the r length(results_pm_copd$health_detailed$results_raw$impact) possible combinations are displayed due to space constraints.
NOTE: only a selection of columns are shown.
E.g., to summarize uncertainty of attributable health impacts (i.e. to get a single confidence interval instead of many combinations) by using a Monte Carlo simulation.
A Monte Carlo simulation is a statistical method that
generates repeated random sampling [@Robert2004_book; @Rubinstein2016_book.
In healthiar, you can use the function summarize_uncertainty()
to simulate values in the arguments with uncertainty and
estimate a single confidence interval in the results.
For each entered input argument that includes a (95%) confidence interval
(i.e. _lower and _upper bound value)
a distribution is fitted (see distributions below).
The median of the simulated attributable impacts is reported as the central estimate.
The 2.5th and 97.5th percentiles of these simulated impacts define
the lower and upper bounds of the 95% summary uncertainty interval.
Aggregated central, lower and upper estimates are obtained
by summing the corresponding values of each lower level unit.
summarize_uncertainty() assumes the following shapes of the distributions
in the simulations:
Relative risk: The values are simulated based on an optimized gamma distribution,
which fits well as relative risks are positive and their distributions are usually right-skewed.
The gamma distribution is parametrized such that its mean is equal to the central relative risk estimate
(rate= shape/rr_central). The shape parameter is then optimized using
stats::optimize() to match the inputed 95% confidence interval bounds,
with stats::qgamma() used to evaluate candidate distributions.
Finally, n_sim relative risk values are simulated using stats::rgamma().
Exposure, cutoff, baseline health data and duration:
The values are simulated based on a normal distribution
using stats::rnorm() with
mean = exp_central, mean = cutoff_central, mean = bhd_central or mean = duration_central
and a standard deviation based on corresponding lower and upper 95% exposure
confidence interval values. The standard deviation is calculated as $$(upper-lower)/(2*1.96)$$,
since for a normal distribution the 95% CI spans approximately two standard deviations on either side of the mean.
Disability weights: The values are simulated based on a beta distribution,
as both the disability weights and the beta distribution are bounded by 0 and 1.
The beta distribution best fitting the inputted central disability weight estimate and
corresponding lower and upper 95% confidence interval values is fitted using
stats::qbeta() (the best fitting distribution parameters
shape1 and shape2 are determined using stats::optimize()). For this purpose,
we partly adapted the R function prevalence::beta_expert
with permission of one of the authors [@Devleesschauwer2022_package].
Finally, n_sim disability weight values are simulated using stats::rbeta().
For stability of the 95% confidence interval, a large number of simulations (e.g., 10,000) is recommended in practice. The example below uses n_sim = 100 for brevity.
results_pm_copd_summarized <- summarize_uncertainty( output_attribute = results_pm_copd, n_sim = 100 )
The outcome of the Monte Carlo analysis is added to the variable entered as the results argument, which is results_pm_copd in our case.
Two lists ("folders") are added:
uncertainty_main contains the central estimate and the corresponding 95% confidence intervals obtained through the Monte Carlo assessment and
uncertainty_detailed contains all n_sim simulations of the Monte Carlo assessment.
results_pm_copd_summarized$uncertainty_main
results_pm_copd_summarized$uncertainty_main |> knitr::kable()
The folder uncertainty_detailed contains all single simulations.
Let's look at the impact of the first 10 simulations.
The columns erf_ci, exp_ci, bhd_ci, and cutoff_ci indicate
the source of uncertainty component used for that simulation
(in the first 10 simulations, all use central estimates).
results_pm_copd_summarized$uncertainty_detailed$impact_by_sim
results_pm_copd_summarized$uncertainty_detailed$impact_by_sim |> knitr::kable()
rm(results_pm_copd_summarized)
E.g., to quantify COPD cases attributable to air pollution exposure by applying a user-defined exposure-response function (ERF), such as the MR-BRT curves from Global Burden of Disease study.
In this case, the function arguments erf_eq_... require a function as input, so we use an auxiliary function (splinefun()) to transform the points on the ERF into type function.
results_pm_copd_mr_brt <- attribute_health( exp_central = 8.85, bhd_central = 30747, cutoff_central = 0, # Specify the function based on x-y point pairs that lie on the ERF erf_eq_central = splinefun( x = c(0, 5, 10, 15, 20, 25, 30, 50, 70, 90, 110), y = c(1.00, 1.04, 1.08, 1.12, 1.16, 1.20, 1.23, 1.35, 1.45, 1.53, 1.60), method = "natural") )
The ERF curve created looks as follows
x <- c(0, 5, 10, 15, 20, 25, 30, 50, 70, 90, 110) y <- c(1.00, 1.04, 1.08, 1.12, 1.16, 1.20, 1.23, 1.35, 1.45, 1.53, 1.60) spline_func <- stats::splinefun( x = x, y = y, method = "natural") ## Generate finer x-values x_dense <- seq(min(x), max(x), length.out = 200) ## Evaluate the spline function at these points y_dense <- spline_func(x_dense) ## Plot plot(x, y, # main = "User-defined ERF", main = "", xlab = "Exposure", ylab = "Relative risk", col = "blue", pch = 19) lines(x_dense, y_dense, col = "red", lwd = 2) legend("topleft", legend = c("Original points", "Spline curve"), col = c("blue", "red"), pch = c(19, NA), lty = c(NA, 1), lwd = c(NA, 2))
Alternatively, other functions (e.g. approxfun()) can be used to create the ERF
E.g., to quantify health impacts attributable to air pollution in a country by age group.
To obtain age-group-specific results, the baseline health data (and possibly exposure) must be available by age group.
If the age argument was specified, age-group-specific results are available under health_detailed in the sub-folder results_by_age_group.
results_age_group <- attribute_health( approach_risk = "relative_risk", age = c("below_65", "65_plus"), exp_central = c(8, 7), cutoff_central = c(5, 5), bhd_central = c(1000, 5000), rr_central = 1.06, rr_increment = 10, erf_shape = "log_linear" )
results_age_group$health_detailed$results_by_age_group |> dplyr::select(age_group, impact_rounded, exp, bhd) |> knitr::kable()
rm(results_age_group)
E.g., to quantify health impacts attributable to air pollution in a country by sex.
The baseline health data (and possibly exposure) must be entered by sex.
results_sex <- attribute_health( approach_risk = "relative_risk", sex = c("female", "male"), exp_central = c(8, 8), cutoff_central = c(5, 5), bhd_central = c(1000, 1100), rr_central = 1.06, rr_increment = 10, erf_shape = "log_linear" )
If the sex argument was specified, sex-specific results are available under health_detailed in the sub-folder results_by_sex.
results_sex$health_detailed$results_by_sex |> dplyr::select(sex, impact_rounded, exp, bhd) |> knitr::kable()
rm(results_sex)
E.g., to quantify attributable health impacts stratified by a sub-group different to age and sex, e.g., education level.
A single vector (or a data frame / tibble with multiple columns) to group the results by can be entered to the info argument. In this example, this will be information about the education level.
In a second step one can group the results based on one or more columns and so summarize the results by the preferred sub-groups.
output_attribute <- healthiar::attribute_health( rr_central = 1.063, rr_increment = 10, erf_shape = "log_linear", cutoff_central = 0, exp_central = c(6, 7, 8, 7, 8, 9, 8, 9, 10, 9, 10, 11), bhd_central = c(600, 700, 800, 700, 800, 900, 800, 900, 1000, 900, 1000, 1100), geo_id_micro = rep(c("a", "b", "c", "d"), each = 3), info = data.frame( education = rep(c("secondary", "bachelor", "master"), times = 4)) # education level )
output_stratified <- output_attribute$health_detailed$results_raw |> dplyr::group_by(info_column_1) |> dplyr::summarize(mean_impact = mean(impact))|> dplyr::pull(mean_impact) |> print()
rm(output_attribute, output_stratified)
E.g., to quantify attributable health impacts stratified by age, sex and additional sub-group e.g. education level.
output_attribute <- healthiar::attribute_health( rr_central = 1.063, rr_increment = 10, erf_shape = "log_linear", cutoff_central = 0, age_group = base::rep(c("50_and_younger", "50_plus"), each = 4, times= 2), sex = base::rep(c("female", "male"), each = 2, times = 4), exp_central = c(6, 7, 8, 7, 8, 9, 8, 9, 10, 9, 10, 11, 10, 11, 12, 13), bhd_central = c(600, 700, 800, 700, 800, 900, 800, 900, 1000, 900, 1000, 1100, 1000, 1100, 1200, 1000), geo_id_micro = base::rep(c("a", "b"), each = 8), info = base::data.frame( education = base::rep(c("without_master", "with_master"), times = 8)) # education level )
output_stratified <- output_attribute$health_detailed$results_raw |> dplyr::group_by(info_column_1) |> dplyr::summarize(mean_impact = mean(impact))|> dplyr::pull(mean_impact) |> print()
rm(output_attribute, output_stratified)
E.g., to quantify the years of life lost (YLL) due to deaths from COPD attributable to PM2.5 exposure during one year.
The life table approach to obtain YLL and deaths requires population and baseline mortality data to be stratified by one year age groups. However, in some cases these data are only available for larger age groups (e.g., 5-year data: 0-4 years old, 5-9 years old, ...). What to do?
If your population and mortality data are not available by one-year age group, your data must be prepared by interpolating values. The healthiar function prepare_lifetable() makes this conversion using the same approach as the WHO tool
AirQ+ [@WHO2020_report].
If your population and death data are stratified by one-year age group, you are lucky, you can ignore this initial step.
The life table methodology of attribute_lifetable() follows that of the WHO tool AirQ+ [@WHO2020_report], which is described in more detail by @Miller2003_jech.
In short, two scenarios are compared:
a) a scenario with the exposure level specified in the function ("exposed scenario") and
b) a scenario with no exposure ("unexposed scenario").
First, the entry and mid-year populations of the (first) year of analysis in the unexposed scenario is determined using modified survival probabilities. Second, age-specific population projections using scenario-specific survival probabilities are done for both scenarios. Third, by subtracting the populations in the unexposed scenario from the populations in the exposed scenario the premature deaths/years of life lost attributable to the exposure are determined.
An expansive life table case study for is available in a report of @Miller2010_report.
The entry (i.e. start of year) populations in both scenarios (exposed and unexposed) is determined as follows:
$$entry_population_{year_1} = midyear_population_{year_1} + \frac{deaths_{year_1}}{2}$$
The survival probabilities in the exposed scenario from start of year $i$ to start of year $i+1$ are calculated as follows:
$$prob_survival = \frac{midyear_population_i - \frac{deaths_i}{2}}{midyear_population_i + \frac{deaths_i}{2}}$$
Analogously, the probability of survival from start of year $i$ to mid-year $i$:
$$prob_survival_until_midyear = 1 - \frac{1 - prob_survival}{2}$$
The survival probabilities in the unexposed scenario are calculated as follows:
First, the age-group specific hazard rate in the exposed scenario is calculated using the inputted age-specific mid-year populations and deaths.
$$hazard_rate = \frac{deaths}{mid_year_population}$$
Second, the hazard rate is multiplied with the modification factor ($= 1 - PAF$) to obtain the age-specific hazard rate in the unexposed scenario.
$$hazard_rate_mod = hazard_rate \times modification_factor$$
Third, the the age-specific survival probabilities (from the start until the end in a given age group) in the unexposed scenario are calculated as follows (cf. Miller & Hurley 2003):
$$prob_survival_mod = \frac{2-hazard_rate_mod}{2+hazard_rate_mod}$$
The mid-year populatios of the (first) year of analysis (year_1) in the unexposed scenario are determined as follows:
First, the survival probabilities from start of year $i$ to mid-year $i$ in the unexposed scenario is calculated as:
$$prob_survival_until_midyear_{mod} = 1 - \frac{1 - prob_survival_mod}{2}$$
Second, the mid-year populations of the (first) year of analysis (year_1) in the unexposed scenario is calculated:
$$midyear_population_unexposed_{year_1} = entry_population_{year_1} \times prob_survival_until_midyear_{mod}$$
Using the age group-specific and scenario-specific survival probabilities calculated above, future populations of each age-group under each scenario are calculated.
The population projections for the two possible options of approach_exposure ("single_year" and "constant") for the unexposed scenario are different. In the case of "single_year" exposure, the population projection for the years after the year of exposure is the same as in the unexposed scenario.
In the case of "constant" the population projection is done as follows:
First, the entry population of year $i+1$ is calculated (which is the same as the end of year population of year $i$) using the entry population of year $i$.
$$entry_population_{i+1} = entry_population_i \times prob_survival$$
Second, the mid-year population of year $i+1$ is calculated.
$$midyear_population_{i+1} = entry_population_{i+1} \times prob_survival_until_midyear$$
The entry and mid-year population projections of in the exposed scenario is done as follows:
First, the entry population of year $i+1$ is calculated (which is the same as the end of year population of year $i$) by multiplying the entry population of year $i$ and the modified survival probabilities.
$$entry_population_{i+1} = entry_population_i \times prob_survival_mod$$
Second, the mid-year population of year $i+1$ is calculated.
$$midyear_population_{i+1} = entry_population_{i+1} \times prob_survival_until_midyear$$
We can use attribute_lifetable() combined with life table input data to determine YLL attributable to an environmental stressor.
results_pm_yll <- attribute_lifetable( year_of_analysis = 2019, health_outcome = "yll", rr_central = 1.118, rr_increment = 10, erf_shape = "log_linear", exp_central = 8.85, cutoff_central = 5, min_age = 20, # age from which population is affected by the exposure # Life table information age_group = exdat_lifetable$age_group, sex = exdat_lifetable$sex, population = exdat_lifetable$midyear_population, # In the life table case, BHD refers to deaths bhd_central = exdat_lifetable$deaths )
Total YLL attributable to exposure (sum of sex-specific impacts).
results_pm_yll$health_main
results_pm_yll$health_main |> dplyr::slice_head() |> dplyr::select(impact_rounded, erf_ci, exp_ci, bhd_ci) |> knitr::kable()
Use the two arguments approach_exposure and approach_newborns to modify the life table calculation:
approach_exposure
"single_year" (default) population is exposed for a single year and the impacts of that exposure are calculated
"constant" population is exposed every year
approach_newborns
"without_newborns" (default) assumes that the population in the year of analysis is followed over time, without considering newborns being born
"with_newborns" assumes that for each year after the year of analysis n babies are born, with n being equal to the (male and female) population aged 0 that is provided in the argument population
Attributable YLL results
per year
per age (group)
per sex (if sex-specific life table data entered)
are available.
Note: We will inspect the results for females; male results are also available.
NOTE: only a selection of years is shown.
results_pm_yll$health_detailed$results_raw |> dplyr::summarize( .by = year, impact = sum(impact, na.rm = TRUE) )
results_pm_yll$health_detailed$results_raw |> dplyr::summarize( .by = year, impact = sum(impact, na.rm = TRUE)) |> knitr::kable()
results_pm_yll$health_detailed$intermediate_calculations |> dplyr::filter(sex == "female") |> dplyr::pull(impact_by_age_and_year) |> purrr::pluck(1)
results_pm_yll$health_detailed$intermediate_calculations |> dplyr::filter(sex == "female") |> dplyr::pull(impact_by_age_and_year) |> purrr::pluck(1) |> dplyr::select(age_start, age_end, impact_2019) |> dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> knitr::kable()
Baseline scenario refers to the scenario with exposure (i.e. the scenario specified in the assessment).
results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_exposed_by_age_and_year) |> purrr::pluck(1)
results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_exposed_by_age_and_year) |> purrr::pluck(1) |> dplyr::select(age_start, midyear_population_2019, midyear_population_2020, midyear_population_2021, midyear_population_2022) |> dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> knitr::kable()
Impacted scenario refers to the scenario without exposure.
results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_unexposed_by_age_and_year) |> purrr::pluck(1)
results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_unexposed_by_age_and_year) |> purrr::pluck(1) |> dplyr::select(age_start, midyear_population_2019, midyear_population_2020, midyear_population_2021, midyear_population_2022) |> dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> knitr::kable()
E.g., to determine premature deaths from COPD attributable to PM2.5 exposure during one year.
See example on YLL for additional info on attribute_lifetable() calculations and its output.
results_pm_deaths <- attribute_lifetable( health_outcome = "deaths", year_of_analysis = 2019, rr_central = 1.118, rr_increment = 10, erf_shape = "log_linear", exp_central = 8.85, cutoff_central = 5, min_age = 20, # age from which population is affected by the exposure # Life table information age_group = exdat_lifetable$age_group, sex = exdat_lifetable$sex, population = exdat_lifetable$midyear_population, bhd_central = exdat_lifetable$deaths )
Total premature deaths attributable to exposure (sum of sex-specific impacts).
results_pm_deaths$health_main$impact
results_pm_deaths$health_main |> dplyr::slice_head() |> dplyr::select(impact_rounded, erf_ci, exp_ci, bhd_ci) |> knitr::kable()
Attributable premature deaths results
per year (if argument approach_exposure = "constant")
per age (group)
per sex (if sex-specific life table data entered)
are available.
Note: we will inspect the results for females; male results are also available.
Note: because we set the function argument approach_exposure = "constant" in the function call results are available for one year (the year of analysis).
results_pm_deaths$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_unexposed_by_age_and_year) |> purrr::pluck(1)
results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_unexposed_by_age_and_year) |> purrr::pluck(1) |> dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> knitr::kable()
rm(results_pm_deaths)
E.g., to quantify the years lived with disability (YLD) attributable to air pollution exposure using disability weights.
To quantify the YLDs, you can use a prevalence-based or an incidence-based approach [@Kim2022_kspm].
Prevalence-based : Enter 1 (year) in the argument(s) dw_... and prevalence cases in bhd_....
Incidence-based: Enter a value above 1 in dw_... and incidence cases in bhd_....
results_pm_copd_yld <- attribute_health( rr_central = 1.1, rr_increment = 10, erf_shape = "log_linear", exp_central = 8.85, cutoff_central = 5, bhd_central = 1000, duration_central = 10, dw_central = 0.2 )
results_pm_copd_yld$health_main
results_pm_copd_yld$health_main |> dplyr::select(erf_ci, impact) |> knitr::kable()
E.g., to obtain the disability-adjusted life years (DALY) as the sum of YLLs and YLDs.
To obtain the attributable DALY, its two components, i.e. years of life lost (YLL) and years lived with disability (YLD), must be summed [@GBD2020_tl].
$$DALY = YLL + YLD$$
This is possible using the function daly().
results_daly <- daly( output_attribute_yll = results_pm_yll, output_attribute_yld = results_pm_copd_yld )
YLL, YLD & DALY
results_daly$health_main |> select(impact_yll_rounded, impact_yld_rounded, impact_rounded)
results_daly$health_main |> dplyr::select(impact_yll_rounded, impact_yld_rounded, impact_rounded) |> knitr::kable()
rm(results_daly, results_pm_yll, results_pm_copd_yld)
E.g., to quantify health impacts using attribute_healthin an scenario B very similar to a previous scenario A.
scenario_A <- attribute_health( exp_central = 8.85, # EXPOSURE 1 cutoff_central = 5, bhd_central = 25000, approach_risk = "relative_risk", erf_shape = "log_linear", rr_central = 1.118, rr_increment = 10)
The function attribute_mod() can be used to modify one or multiple arguments of attribute_healthin an existing scenario, e.g. scenario_A.
scenario_B <- attribute_mod( output_attribute = scenario_A, exp_central = 6 )
This is equivalent to building the whole scenario again (see below), but more time and code efficient.
scenario_B <- attribute_health( exp_central = 6, # EXPOSURE 2 cutoff_central = 5, bhd_central = 25000, approach_risk = "relative_risk", erf_shape = "log_linear", rr_central = 1.118, rr_increment = 10)
E.g., to compare the health impacts in the scenario "before intervention" vs. "after intervention".
Two approaches can be used for the comparison of scenarios:
Delta: Subtraction of health impact in scenario 1 minus in scenarios 2 (i.e. two PAF) [@WHO2014_book]
Population impact fraction (PIF) [@WHO2003_report; @Murray2003_spbm; @Askari2020_ijph].
Note that the PIF comparison approach assumes same baseline health data for scenario 1 and 2 (e.g., comparison of two scenarios at the same time point), while the delta comparison approach, the difference between two scenarios is obtained by subtraction. Therefore, the delta approach is suited for comparison of scenarios in different time points.
IMPORTANT If your aim is to quantify health impacts from a policy intervention, be aware that you should use the same year of analysis and therefore same health baseline data in both scenarios. The only variable that should change in the second scenario is the exposure (change as a result of the intervention).
The Population Impact Fraction (PIF) is defined as the proportional change in disease or mortality when exposure to a risk factor is changed, for instance due to an intervention.
The most general equation describing this mathematically is an integral form [@WHO2003_report; @Murray2003_spbm]:
$$PIF = \frac{\int rr_at_exp(x)PE(x)dx - \int rr_at_exp(x)PE'(x)dx}{\int rr_at_exp(x)PE(x)dx}$$
Where:
If the population exposure is described as a categorical rather than continuous exposure, the integrals may be converted to sums [@WHO2003_report; Murray2003-spbm]:
$$PIF = \frac{\sum rr_at_exp_{i} \times PE_{i} - \sum rr_at_exp_{i}PE'{i}}{\sum rr_at_exp{i}PE_{i}}$$
Where:
Finally, if the exposure is provided as the population weighted mean concentration, the equation for the PIF is reduced to:
$$PIF = \frac{rr_at_exp - rr_at_exp_{alt}}{rr}$$
Where:
attribute_health() to calculate burden of scenarios A & B.scenario_A <- attribute_health( exp_central = 8.85, # EXPOSURE 1 cutoff_central = 5, bhd_central = 25000, approach_risk = "relative_risk", erf_shape = "log_linear", rr_central = 1.118, rr_increment = 10)
scenario_B <- attribute_mod( output_attribute = scenario_A, exp_central = 6 )
compare() to compare scenarios A & B.results_comparison <- healthiar::compare( approach_comparison = "delta", # or "pif" (population impact fraction) output_attribute_scen_1 = scenario_A, output_attribute_scen_2 = scenario_B )
The default value for the argument approach_comparison is "delta". The alterntive is "pif" (population impact fraction). See the function documentation of compare() for more details.
results_comparison$health_main
results_comparison[["health_main"]] |> dplyr::select( impact, impact_rounded, impact_scen_1, impact_scen_2, bhd, dplyr::starts_with("exp_"), -dplyr::starts_with("exp_ci"), # remove col "exp_ci" dplyr::starts_with("rr_con")) |> dplyr::slice_head() |> knitr::kable()
The compare() results contain two additional outputs in addition to those we have already seen:
health_detailed
scen_1 contains results of scenario 1 (scenario A in our case)
scen_2 contains results of scenario 2 (scenario B in our case)
rm(results_comparison, scenario_A, scenario_B)
E.g., to quantify the total health impact attributable to PM2.5 and NO2.
A methodological report of the EU project BEST-COST [@Strak2024_report] identified three approaches to add up attributable health impacts from correlated exposures:
$$PAF_{additive} = PAF_{exposure1} + PAF_{exposure2}$$
$$PAF_{multiplicative} = \frac{\sum PE \times (rr_at_exp_{multiplicative} - 1)}{\sum PE \times (rr_at_exp_{multiplicative}-1) + 1}$$
$$rr_at_exp_{multiplicative} = rr_at_exp_{exposure1} * rr_at_exp_{exposure2}$$
$$PAF_{combined} = 1-[(1-PAF_{exposure1}) \times (1-PAF_{exposure2})]$$
Attention: To apply any of these approaches, the relative risks for one exposure must be adjusted for the second exposure and the way round.
For this purpose, you can use the function multiexpose().
results_pm <- attribute_health( erf_shape = "log_linear", rr_central = 1.369, rr_increment = 10, exp_central = 8.85, cutoff_central = 5, bhd_central = 30747 ) results_no2 <- attribute_mod( output_attribute = results_pm, exp_central = 10.9, rr_central = 1.031 ) results_multiplicative <- multiexpose( output_attribute_exp_1 = results_pm, output_attribute_exp_2 = results_no2, exp_name_1 = "pm2.5", exp_name_2 = "no2", approach_multiexposure = "multiplicative" )
rm(results_no2, results_pm)
results_multiplicative$health_main
results_multiplicative$health_main |> dplyr::select(impact_rounded) |> knitr::kable()
rm(results_multiplicative)
E.g., to obtain the age-standardized attributable health impacts of two age groups.
Age standardization involves adjusting the observed rates of a particular
outcome to a standard population with a specific age structure.
This is a technique used to allow the comparison of populations
with different age structures [@GBD2020_tldemo; @Ahmad2001_report].
In healthiar, the function standardize() applies the direct method,
where the age-specific rates observed in a study population
are applied to a standard (reference) population distribution.
The standardized health impact rate is computed as $$ impact_per_100k_inhab_{std} = \sum_{i=1}^{k} (impact_per_100k_inhab_i \times ref_prop_pop_i) $$
where:
output_attribute <- attribute_health( rr_central = 1.063, rr_increment = 10, erf_shape = "log_linear", cutoff_central = 0, age_group = c("below_40", "above_40"), exp_central = c(8.1, 10.9), bhd_central = c(1000, 4000), population = c(100000, 500000) ) results <- standardize( output_attribute = output_attribute, age_group = c("below_40", "above_40"), ref_prop_pop = c(0.5, 0.5) )
Age-standardized impact rate:
print(results$health_main$impact_per_100k_inhab)
Age group-specific impact rate:
print(results$health_detailed$results_raw$impact_per_100k_inhab)
rm(results)
E.g., to determine population-weighted mean PM2.5 exposure for several neighborhoods of Brussels (Belgium)
The healthiarfunction prepare_exposure() helps users that do not have the
exposure data (needed for healthiar functions), but only spatial
concentration and population data.
The function calculates an average concentration value in each geographic unit,
weighted with population at each location.
$$ exp = \frac{\sum_{i=1}^{n} (C_i \times population_i)}{\sum_{i=1}^{n} population_i} $$
where:
In case population is entered as count by geographic sub-unit, the function calculates the mean concentration in each sub-unit and aggregates it to higher-level geographic units. If no population data is entered, the function calculates a simple spatial mean concentration as exposure value.
The output of prepare_exposure() can be entered in the argument
exp_mean, exp_lower and/or exp_upper in healthiar functions
such as attribute_health().
# exdat_pwm_1 = Pollution grid data exdat_pwm_1 <- terra::rast(system.file("extdata", "exdat_pwm_1.tif", package = "healthiar")) # exdat_pwm_2 = Data with the geo units and population data. This is pre-loaded in healthiar. # If your raw data are in .gpkg format, you can use e.g. sf::st_read() pwm <- healthiar::prepare_exposure( poll_grid = exdat_pwm_1, # Formal class SpatRaster, geo_units = exdat_pwm_2, # sf of the geographic sub-units population = sf::st_drop_geometry(exdat_pwm_2$population), # population per geographic sub-unit geo_id_macro = sf::st_drop_geometry(exdat_pwm_2$region)) # higher-level IDs to aggregate
Within the function output, the list main contains
the population-weighted mean exposures for the (higher-level) geographic units
in the column exposure_mean and the total population in each unit in column population_total.
knitr::kable(pwm$main)
rm(exdat_pwm_1) rm(exdat_pwm_2) rm(pwm)
E.g., to quantify health impacts in the exposure group 55dB+ (calculation threshold) that are affected by a exposure above the effect threshold of 45 dB (cut-off).
The function arguments erf_eq_... require a function as input. Instead of using a splinefun() this can also be fulfilled by using a 'function(c)' which is of type 'function'.
#setting up function parameters threshold_effect <- 45 RR <- 1.055 threshold_calculation <- 55 rr_increment <- 10 # define categorical function, the ifelse condition enables the case distinction erf_function <- function(c){ output <- ifelse(c<threshold_calculation, 1, exp((log(RR)/rr_increment)*(c-threshold_effect))) return(output) } # attribute_health results_catERF_different_calc_thesh <- healthiar::attribute_health( approach_risk = "relative_risk", erf_eq_central = erf_function, prop_pop_exp = c(300000,200000,150000,120000,100000,70000,60000)/10000000, exp_central = c(47,52,57,62,67,72,77), cutoff_central=0, bhd_central=50000)$health_main$impact_rounded
The used function is equal to $$ f(c) = \begin{cases} 1, & c < \text{threshold} \ \exp\left( \frac{\log(RR)}{rr_{increment}} (c - threshold_{effect}) \right), & c \ge \text{threshold} \end{cases} $$
The categorical ERF curve created looks as follows.
# Evaluate exp_central points x <- c(47,52,57,62,67,72,77) # central exposure values y <- erf_function(x) # Generate finer x-values x_dense <- seq(min(x), max(x), length.out = 200) # Evaluate exp_central points (finer version) y_dense <- erf_function(x_dense) ## Plot plot(x, y # main = "User-defined ERF" , main = "", xlab = "Exposure", ylab = "Relative risk", col = "blue", pch = 19) lines(x_dense,y_dense,col = "red",lwd = 2) legend("topleft", legend = c("Original points", "Categorical ERF curve"), col = c("blue", "red"), pch = c(19, NA), lty = c(NA, 1), lwd = c(NA, 2))
E.g., to monetize the attributable health impact of a policy that will have health benefits five years from now.
In health economic evaluations and economic burden of disease assessments,
health impacts may need to be converted into monetary values.
For this purpose, you can use monetize().
Several valuation metrics are available, depending on how outcomes are quantified in natural or health units (e.g. cases reduced, deaths prevented, reductions in mortality risk, life-years gained, QALYs gained, DALYs averted). Common metrics include the Value of a Statistical Life (VSL) [@OECD2025_book] , the Value of a Life-Year (VOLY) [@Hammitt2007_reep] and the Value of a Quality-Adjusted Life-Year (VAQALY) [@Bobinac2010_vh].
Discounting is the practice of converting future costs (or health impacts as previous step to valuating them) into their present value. The underlying rationale is that the value placed on outcomes declines as they occur further in the future. Therefore, future costs and effects are converted into present-value terms to make them comparable over time [@Attema2018_p].
Discounting is implemented by selecting a discount rate, which is used to compute a discount factor for each time period. This factor is then multiplied by the corresponding future cost (or effect) to express it in present-value terms.
If you need the discounted values of a cost or health outcome,
you can call the healthiar function discount().
If you just need the discount factor, you can alternatively call get_discount_factor() (entering is_deflation = TRUE).
If you just need the inflation factor, you can get_inflation_factor().
Different functional forms can be used to apply discounting. The most common is exponential discounting, also referred to as constant discounting, since outcomes are discounted proportionally as time increases. An alternative is hyperbolic discounting, which tends to better capture human behavior by discounting the near present more heavily than outcomes further in the future [@Lipman2024_jru]
See below the equations that are used behind these functions.
$$monetized_impact = impact \times valuation \times discount_factor \times deflator_factor \times real_growth_factor$$
The arguments discount_factor, deflator_factor and real_growth_factor
are only used if a value is entered in the arguments
discount_rate, ingflation_rate and real_growth_rate respectively
(otherwise ignored).
Exponential discounting
As suggested by @Frederick2002_jel
$$discount_factor = \frac{1}{(1 + discount_rate)^{n_years}}$$
Hyperbolic discounting Harvey
As suggested by @Harvey1986_ms
$$discount_factor = \frac{1}{(1 + n_years)^{discount_rate}}$$
Hyperbolic discounting Mazur
As suggested by @Mazur1987_book
$$discount_factor = \frac{1}{1 + (discount_rate \times n_years)}$$
Inflation can be handled in monetize() by applying a deflator on future values,
projected in nominal terms, in order to convert them into real values
(i.e., express these values in terms of constant prices of a single base year).
Therefore, if the user of the function provides a value for the inflation_rate argument,
a deflator factor [@HMTreasury2022_greenbook; @Brealey2023_book; @Samuelson1937_res]
is applied according to the following formulas
$$deflator_factor = \frac{1}{(1 + inflation_rate)^{n_years}}$$
If a rising societal value of health over time is required in the monetization, unlike inflation, this represents a "real" increase in value. Thus, as societies become wealthier, their willingness to pay to avoid mortality and morbidity risks tends to rise [@OECD2012_book].
For this use case, you have enter a value in the argument real_growth_rate in monetize(),
which allows you to project this growth by applying a valuation growth factor to base-year unit values:
$$real_growth_factor =(1 + real_growth_rate)^{n_years}$$
Where $real_growth_rate$ represents the annual real growth rate in health valuation. This ensures that long-term environmental impacts are not undervalued.
monetized_pm_copd <- monetize( output_attribute = results_pm_copd, discount_shape = "exponential", discount_rate = 0.03, n_years = 5, valuation = 50000 # E.g. EURO )
The outcome of the monetization is added to the variable entered to the output_attribute argument, which is results_pm_copd in our case.
Two folders are added:
monetization_main contains the central monetization estimate and the corresponding 95% confidence intervals obtained through the specified monetization.
monetization_detailed contains the monetized results for each unique combination of the input variable estimates that were provided to the initial attribute_health() call.
monetized_pm_copd$monetization_main
monetized_pm_copd$monetization_main |> select(erf_ci, monetized_impact) |> knitr::kable()
We see that the monetized impact (discounted) is more than 160 million EURO.
Alternatively, you can also monetize (attributable) health impacts from a non-healthiar source.
results <- monetize( impact = 1151, valuation = 100 )
E.g., to perform an economic evaluation for an intervention by comparing its benefits and costs via Cost-Benefit Analysis (CBA).
The CBA is a type of economic evaluation that compares the costs and the benefits of an intervention, considering both measures expressed in monetary terms.
To perform a CBA, you can use the function cba().
This approach requires monetizing benefits so they can be directly compared with costs.
Since interventions typically generate costs and benefits over multi-year time horizons,
discounting is a common practice to obtain the present value of future costs and benefits.
Depending on the reference guidelines,
the discount rate can be specified as the same for costs and benefits or different across them.
The outputs of a Cost-Benefit Analysis can be expressed
as three main indicators [@Boardman2018_book]:
- intervention’s net benefit: the difference between monetized benefits and costs
- Cost-Benefit Ratio (CBR): monetized benefits divided by costs and
- Return on Investment (ROI): return generated per unit of expenditure by relating net benefits to the intervention’s costs.
An intervention is recommended from a Cost-Benefit Analysis perspective,
if it yields a positive net benefit or a positive ROI, or equivalently,
a CBR greater than one, meaning that the intervention’s monetized benefits exceed its costs.
These three outputs are available when running cba()
and are calculated considering the following formulas.
Net Benefit $$net_benefit = benefit - cost$$
Cost-Benefit Ratio (CBR) $$cbr = \frac{benefit}{cost}$$
Return on Investment (ROI) $$roi = \frac{benefit - cost}{cost} \times 100$$
Let's imagine we design a policy that would reduce air pollution to 5 $\mu g/m^3$, which is the concentration specified in the cutoff_central argument in the initial attribute_health() call. So we could avoid all COPD cases attributed to air pollution.
Considering the cost to implement the policy (estimated at 100 million EURO), what would be the monetary net benefit of such a policy? We can find out using the functions healthiar and cba()
cba <- cba( output_attribute = results_pm_copd, valuation = 50000, cost = 100000000, discount_shape = "exponential", discount_rate_benefit = 0.03, discount_rate_cost = 0.03, n_years_benefit = 5, n_years_cost = 5 )
The outcome of the CBA is contained in two folders, which are added to the existing assessment:
cba_main contains the central estimate and the corresponding 95% confidence intervals obtained
cba_detailed contains additional intermediate results for both cost and benefit
benefit contains results by_year and raw results health_raw
cost contains the costs of the policy at the end of the period specified in the n_years_cost argument
cba$cba_main |> dplyr::select(benefit, cost, net_benefit) |> knitr::kable()
rm(cba)
We see that the central and upper 95% confidence interval estimates of avoided attributable COPD cases result in a net monetary benefit of the policy, while the lower 95% confidence interval estimate results in a net cost!
E.g., to estimate the health impact that is theoretically attributable to the difference in degree of deprivation of the population exposed.
Taking into account socio-economic indicators, e.g. a multiple deprivation index [@Mogin2025_ejph], the differences in attributable health impacts across the study areas can be estimated [@Renard2019_bmc; @Otavova_2022_bmc].
Social inequalities are quantified as the difference between the least deprived areas (the last n-quantile) and
the most deprived areas or
the population overall.
These differences can be
absolute or
relative.
$$ absolute_quantile = first - last $$ Where:
$$ relative_quantile = \frac{absolute_quantile}{last} $$
$$ absolute_overall = overall - last $$ Where:
If you assume that the least deprived areas are similar to counter-factual cases (no exposure to deprivation), the relative difference regarding the overall average health impact could be interpreted as some kind of relative risk attributable to social inequalities.
First, quantify health impacts.
health_impact <- healthiar::attribute_health( age_group = exdat_socialize$age_group, exp_central = exdat_socialize$pm25_mean, cutoff_central = 0, rr_central = exdat_socialize$rr, erf_shape = "log_linear", rr_increment = 10, bhd_central = exdat_socialize$mortality, population = exdat_socialize$population, geo_id_micro = exdat_socialize$geo_unit)
Second, use the function socialize() entering the whole output of attribute_health() in the argument output_attribute.
social_t <- healthiar::socialize( output_attribute = health_impact, age_group = exdat_socialize$age_group, # They have to be the same in socialize() and in attribute_health() ref_prop_pop = exdat_socialize$ref_prop_pop, # Population already provided in output_attribute geo_id_micro = exdat_socialize$geo_unit, social_indicator = exdat_socialize$score, n_quantile = 10, increasing_deprivation = TRUE)
Alternatively, you can directly enter the health impact in the socialize() argument impact.
social <- healthiar::socialize( impact = health_impact$health_detailed$results_by_age_group$impact, age_group = exdat_socialize$age_group, # They have to be the same in socialize() and in attribute_health() ref_prop_pop = exdat_socialize$ref_prop_pop, geo_id_micro = exdat_socialize$geo_unit, social_indicator = exdat_socialize$score, population = exdat_socialize$population, # Population has to be provided because no output_attribute n_quantile = 10, increasing_deprivation = TRUE)
social$social_main |> dplyr::select(c(parameter, difference_type, difference_compared_with, difference_value, comment)) |> print()
E.g., to estimate the multiple deprivation index (MDI) to use it for the argument social_indicator in the function socialize().
Socio-economic indicators (e.g., education level, employment status and family structure) can be condensed into a multiple deprivation index (MDI) [@Mogin2025_ejph]. For this purpose, the indicators can be normalized using min-max scaling.
The reliability of the MDI can be assessed using Cronbach's alpha [@Cronbach1951_p].
$$ \alpha = \frac{k}{k - 1} \left( 1 - \frac{\sum_{i=1}^{k} \sigma^2_{y_i}}{\sigma^2_x} \right) $$ where:
To apply this approach, you should ensure that the data set is as complete as possible. Otherwise, you can try to impute missing data using: - Time-Based Imputation: Linear regression based on historical trends if prior years' data is complete. - Indicator-Based Imputation: Multiple linear regression if the missing indicator correlates strongly with others.
Imputation models should have an R^2 greater than or equal to 0.7. If R^2 lower than 0.7, consider alternative data sources or methods.
mdi <- prepare_mdi( geo_id_micro = exdat_prepare_mdi$id, edu = exdat_prepare_mdi$edu, unemployed = exdat_prepare_mdi$unemployed, single_parent = exdat_prepare_mdi$single_parent, pop_change = exdat_prepare_mdi$pop_change, no_heating = exdat_prepare_mdi$no_heating, n_quantile = 10, verbose = FALSE )
Note: verbose = FALSE suppresses any output to the console (default: verbose = TRUE, i.e. with printing turned on).
Function output includes:
mdi_main, a tibble containing the BEST-COST MDImdi$mdi_main |> select(geo_id_micro, MDI, MDI_index)
mdi$mdi_main |> dplyr::select(geo_id_micro, MDI, MDI_index) |> knitr::kable()
The function assesses the reliability of the MDI based on the Cronbach's alpha value as follows: - 0.9 and higher: Excellent reliability - between 0.8 (included) and 0.9: Good reliability - between 0.7 (included) and 0.8: Acceptable reliability - between 0.6 (included) and 0.7: Questionable reliability - lower than 0.6: Poor reliability
mdi_detailed
DESCRIPTIVE STATISTICS
PEARSON'S CORRELATION COEFFICIENTS
CRONBACH'S α, including the reliability rating this value indicates
Code for boxplots of the single indicators
Code for histogram of the MDI's for the geo units with a normal distribution curve
To reproduce the boxlots run
eval(mdi$mdi_detailed$boxplot)
Analogeously, to reproduce the histogram run
eval(mdi$mdi_detailed$histogram)
rm(mdi)
healthiar can be used inside the native pipes |>. See the example below.
exdat_noise |> (\(df) { healthiar::attribute_health( approach_risk = df$risk_estimate_type, exp_central = df$exposure_mean, pop_exp = df$exposed, erf_eq_central = df$erf )$health_main$impact_rounded })()
Shorter making used of the base R function with().
exdat_noise |> (\(df) { with(df, healthiar::attribute_health( approach_risk = risk_estimate_type, exp_central = exposure_mean, pop_exp = exposed, erf_eq_central = erf ))$health_main$impact_rounded })()
healthiar can also be used inside magrittr pipes %>% as follows.
exdat_noise %>% { healthiar::attribute_health( approach_risk = .$risk_estimate_type, exp_central = .$exposure_mean, pop_exp = .$exposed, erf_eq_central = .$erf )$health_main$impact_rounded }
Exporting and visualizing results is out of scope of healthiar. To export and visualize, you can make use of existing functions in other packages beyond healthiar as indicated below.
Export as .csv file
write.csv(x = results_pm_copd$health_main, file = "exported_results/results_pm_copd.csv")
Save as .Rdata file
save(results_pm_copd, file = "exported_results/results_pm_copd.Rdata")
Export to Excel (as .xlsx file)
openxlsx::write.xlsx(x = results_pm_copd$health_main, file = "exported_results/results_pm_copd.xlsx")
Visualization is out of scope of healthiar. You can visualize in:
ggplot2 [@Wickham2016_ggplot],BHD/bhd = baseline health data
CI = confidence interval
CBA/cba = cost-benefit analysis
exp = exposure
ERF = exposure-response function
RR/rr = relative risk
WHO = World Health Organization
YLL/yll = years of life lost
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.