knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
While we most often want to directly build models on our original dataset to generate predicted values, we might instead want to generate average trends across larger groups instead, and then apply this to our original data. For instance, generating trends by region, and then applying those regional trends back to the country level. The predict_...avg_trend()
functions in augury allow us to do just that, applying any of the models we are used to a grouped set of columns.
These work across specific groups, specified by average_cols
, and averaging numeric values specified as the response variable or variables extracted from a formula
. The specified model is then fit to this averaged data, and the predicted values are joined back up to the original data frame. Let's look at an example using blood pressure data, which has a comprehensive time series.
library(augury) df <- ghost::gho_data("BP_04", query = "$filter=Dim1 eq 'MLE' and Dim2 eq 'YEARS18-PLUS'") %>% billionaiRe::wrangle_gho_data() %>% dplyr::right_join(covariates_df) %>% dplyr::select(iso3, year, year_n, value) %>% dplyr::filter(whoville::is_who_member(iso3), # keep WHO member states year >= 2000, year <= 2023) %>% # get relevant years dplyr::mutate(who_region = whoville::iso3_to_regions(iso3)) # get WHO regions ur <- unique(df$who_region) ur
Alright, so, here we have r length(ur)
WHO regions. We will use these regions to fit a model to and use INLA to predict out to 2023, then apply these trends to input countries.
pred_df <- df %>% predict_inla_avg_trend(formula = value ~ f(year_n, model = "rw2"), average_cols = c("who_region", "year_n"), group_models = TRUE, group_col = "iso3", sort_col = "year_n") pred_df %>% dplyr::filter(iso3 == "AFG", year >= 2013)
Above, we can see we have a generated a model using 2nd order random walk with INLA, however, the model was generated by averaging data to WHO regions first, fitting the random walk to each reach (since group_models = TRUE
) and then fitting those trends to the original data. Note some specifics of what had to be set, as the predict_..._avg_trend()
functions are slightly more complex than others:
average_cols
must contain the sort_col
. So, since we use year_n
in the time series rather than year
, we will sort by that this time.average_cols
refers to the groupings used for averaging (we take the average for each WHO region and year in this case). Then, the model is fit to average_cols
that are NOT the sort_col
.group_col
is the groupings used on the original data frame, which is still necessary here when applying the trend back to the original data.formula
, it must either be in average_cols
or it must be a numeric column that can be averaged. This is because the formula is applied to the data frame after dplyr::group_by()
and dplyr::summarize()
have reduced it.To highlight this point, in the above example, what's actually happening is we are actually fitting the model on the summarized data:
df %>% dplyr::group_by(who_region, year_n) %>% dplyr::summarize(value = mean(value, na.rm = T)) %>% head()
Since average_cols = c("who_region", "year_n")
, we took the mean of all values in formula
not in average_cols
, in this case just value
. If for instance, we tried to specify a model using iso3
in the formula
:
predict_inla_avg_trend(df, formula = value ~ iso3 + f(year_n, model = "rw2"), average_cols = c("who_region", "year_n"), group_models = TRUE, group_col = "iso3", sort_col = "year_n")
We get an error message indicating that iso3 must be numeric or included in average_cols
for grouping. This is because without it being numeric or in the average_cols
, there's no way dplyr::group_by() %>% dplyr::summarize()
a non-numeric column automatically (how would we reduce country-level ISO3 codes to the regional level?).
While slightly complex, ensuring you follow the above means you should easily and successfully get out meaningful trend predictions for your data frames using trends generated on grouped data.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.