knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

In order to show the INLA modeling wrappers provided in augury, we will look at two datasets publicly available on the World Health Organization's Global Health Observatory. These can be accessed using the ghost package, which provides an R interface for the GHO OData API.

Time series modeling

The first indicator will be on safe sanitation. We will also use the billionaiRe package to quickly transform the GHO data into the simple format used by augury, billionaiRe, and other packages.

library(augury)

df <- ghost::gho_data("WSH_SANITATION_SAFELY_MANAGED",
                      query = "$filter=Dim1 eq 'TOTL'") %>%
  billionaiRe::wrangle_gho_data(source = "WHO GHO",
                                type = "estimated")

head(df)

Now that we have the input data available from the GHO in an easy to use format, we can now join up with the covariates_df available in augury and run a time series model to predict sanitation out to 2023. For simplicity, let's just look at Albania, with ISO3 code "ALB".

library(dplyr)

df <- left_join(covariates_df,
                df,
                by = c("iso3", "year")) %>%
  filter(iso3 == "ALB")

head(df)

Of course, the only "covariate" being used in this time series model is going to be year_n, but the rest are available if we want to expand to test other types of modeling. Let's run the modeling now. We are going to scale the data and probit transform it before and after the modeling. We will use the predict_inla_ts() to fit a time series model to the data.

modeled_df <- df %>%
  scale_transform("value") %>%
  probit_transform("value") %>%
  predict_inla_ts(type_col = "type",
                  source_col = "source",
                  source = "augury modeling") %>%
  probit_transform(c("value", "pred", "upper", "lower"), inverse = TRUE) %>%
  scale_transform(c("value", "pred", "upper", "lower"), divide = FALSE)

# Only look at recent years and relevant columns

modeled_df %>%
  filter(year > 2015) %>%
  select(iso3, year, value, pred, lower, upper, source, type)

And there we go, we have now fit a time series model to our data, provided new type and source, and merged this into our existing data frame. However, in this setup, the error calculations returned by predict_inla_ts() are calculated in the probit space. If we wanted to scale and probit transform the response variable prior to model fitting, but still calculate error metrics and automatically return the response and predicted values back in the original space, we can set scale = 100 and probit = TRUE within predict_inla_ts().

df %>%
  predict_inla_ts(scale = 100,
                  probit = TRUE,
                  type_col = "type",
                  source_col = "source",
                  source = "augury modeling") %>%
  filter(year > 2015) %>%
  select(iso3, year, value, pred, lower, upper, source, type)

And we can see that the results here are the same as manually scaling and probit transforming the variables.

Mixed-effects modeling

Now we will look at another indicator, a composite of 13 International Health Regulations core capacity scores, SPAR version. Since countries only have two data points at most, we will use mixed-effects modeling to infill and project the data.

df <- ghost::gho_data("SDGIHR2018") %>%
  billionaiRe::wrangle_gho_data(source = "Electronic State Parties Self-Assessment Annual Reporting Tool (e-SPAR)",
                                type = "reported")

head(df)

With this, let's go straight into the modeling like last time, except we will now use predict_inla_me() for mixed-effects modeling using covariates found in covariates_df. This time, we want to model a first order auto-regressive process across time rather than a second-order random walk, so we use the "ar1" model available in INLA.

modeled_df <- df %>%
  right_join(covariates_df, by = c("iso3", "year")) %>%
  scale_transform("value") %>%
  probit_transform("value") %>%
  predict_inla_me(model = "ar1",
                  type_col = "type",
                  source_col = "source",
                  source = "WHO DDI Preliminary infilling and projections") %>%
  probit_transform(c("value", "pred", "upper", "lower"), inverse = TRUE) %>%
  scale_transform(c("value", "pred", "upper", "lower"), divide = FALSE)

# Look at an example for Afghanistan

modeled_df %>%
  filter(year > 2017, iso3 == "AFG") %>%
  select(iso3, year, value, pred, lower, upper, source, type)

And exactly as we were able to do with the time series modeling, we now have infilled missing data for this indicator using mixed-effects modeling in INLA.

Building further

Building further on this work, you can tweak any of the arguments passed to these INLA models or use the base predict_inla() and other covariates to test and compare other models. There is much more functionality to test modeling accuracy and iteratively develop methods available in this package not shown here, so please continue to explore and play around.



caldwellst/augury documentation built on Oct. 10, 2024, 8:20 a.m.