knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(dplyr)
library(forcats)
library(ggplot2)
library(sf)
library(naomi)
#' areas
area_levels <- naomi::demo_area_levels
area_hierarchy <- naomi::demo_area_hierarchy
area_boundaries <- naomi::demo_area_boundaries

#' population
population_agesex <- naomi::demo_population_agesex
age_group_meta <- naomi::get_age_groups()

fertility <- data.frame(area_id = character(0),
                        time = numeric(0),
                        age_group = character(0),
                        calendar_quarter = character(0),
                        asfr = numeric(0))

#' surveys
survey_meta <- naomi::demo_survey_meta
survey_regions <- naomi::demo_survey_regions
survey_clusters <- naomi::demo_survey_clusters
survey_individuals <- naomi::demo_survey_individuals
survey_biomarker <- naomi::demo_survey_biomarker

survey_hiv_indicators <- naomi::demo_survey_hiv_indicators

#' programme
art_number <- naomi::demo_art_number
anc_testing <- naomi::demo_anc_testing

Data model diagramme

data model

Areas data

The fields center_x and center_y define in area_hierarchy defines longitude/latitude coordinates within the area. This field is currently optional. The R package will construct these centers from the boundaries if they are not provided. They might wish to be provided for two reasons:

  1. Offset centers might be provided to avoid overlapping centroids when creating bubble plots (e.g. Zomba and Zomba City).
  2. In future modelling we might rely on population-weighted centroids to estimate average distances between areas.

From a conceptual perspective, area_hierarchy and area_boundaries each have one record per area_id and it would make sense for them to be in a single table schema. They are separate schemas for convenience so that area_hierarchy can be saved as human-readable CSV file while area_boundaries is saved as .geojson format by default.

The figures below show example code for generating a typical plot from the Areas schemas:

area_hierarchy %>%
  left_join(area_levels %>% select(area_level, area_level_label)) %>%
  mutate(area_level_label = area_level_label %>% fct_reorder(area_level)) %>%
  ggplot() +
  geom_sf(data = . %>% left_join(area_boundaries) %>% st_as_sf()) +
  geom_label(aes(center_x, center_y, label = area_sort_order), alpha = 0.5) +
  facet_wrap(~area_level_label, nrow = 1) +
  naomi:::th_map()

Population data

Time is identified as quarter_id defined as the number of calendar quarters since the year 1900 (inspired by DHS Century Month Code [CMC]): $$ \mathrm{quarter_id} = (\mathrm{year} - 1900) * 4 + \mathrm{quarter}.$$ The function interpolate_population_agesex() interpolates population estimates to specified quarter_ids.

naomi::get_age_groups()

Survey data

The remaining tables are harmonized survey microdatasets used for calculating the indicators dataset.

The table survey_hiv_indicators should also contain all survey HIV prevalence inputs required for Spectrum and EPP. It should be further extended to also calculate other indicators required by Spectrum, e.g. HIV testing outcomes for shiny90, proportion ever had sex, breastfeeding duration, and fertlity by HIV status.

Programme data

The model is currently specified to accept ART numbers by age 0-14 (age_group =r filter(naomi::get_age_groups(), age_group_label == "0-14")) and age 15+ (`age_group = `r filter(naomi::get_age_groups(), age_group_label == "15+")$age_group) either both sexes together (sex = "both") or by sex (sex = "female"/sex = "male"). Possible extension may allow ART inputs by finer stratification.

For art_number it is important to distinguish between zero persons receiving ART (e.g. no ART available in the area) versus missing data about the number on ART in an area. Current specification requires a value art_current = 0 for an area with no ART whereas no entry for a given area will be interpreted as missing data. This could be revised, for example to require explicit input for all areas with a code for missing data.

The anc_testing data is currently input for all ages of pregnant women aggregated, that is age_group =r filter(naomi::get_age_groups(), age_group_label == "15-49")$age_group`` for age 15-49.



mrc-ide/naomi documentation built on April 10, 2024, 2:13 p.m.