In this vignette, we describe a typical example of building a model in R, which uses the demography and templates available through the Montagu API, and submited burden estimates to it. We will proceed very naively and cover all of the important functionality on the way.

Identify the Montagu Server

Begin by identifying the Montagu server to connect to. You must have an account on this server; typically this will be our live production server, and you will have received login information on joining VIMC. Identify the server as follows:-

montagu::montagu_server_global_default_set(
  montagu::montagu_server("production", "montagu.vaccineimpact.org"))
knitr::opts_chunk$set(error = FALSE, fig.path = "figure/montagu_user_guide-")
options(montagu.username = "test.modeller@imperial.ac.uk",
        montagu.password = "password")
montagu::montagu_server_global_default_set(
  montagu::montagu_server("science", "support.montagu.dide.ic.ac.uk", 11443))
montagu::montagu_diseases()

The next time you query the server, you may be asked to login with a username and password.

Basic Information

For many of the montagu api functions, you will need to provide your modelling group id, and your disease id. Your group id will have been communicated on joining VIMC; the disease ids can be looked up with a simple function call like this:-

montagu::montagu_diseases()

Touchstones

A touchstone is an identifier that links:

Touchstone ids are referred to as a basename and a version together. Errata to the input data may be addressed with a new touchstone version. Generally, modelling groups will know which touchstone they are required to run models against. To lookup all the currently open touchstones for a group:-

montagu::montagu_touchstone_versions("IC-Garske", require_open = TRUE)

If we knew the basic touchstone name (without version) that we were interested in, then we could also have asked:

montagu::montagu_touchstone_versions("IC-Garske", "201710gavi", require_open = TRUE)

Scenarios

Within a touchstone, we may have various scenarios, describing different vaccination conditions. These vary across diseases due to the different types of vaccination approaches that have been employed. We look our the scenarios we are required to model for a touchstone with:

montagu::montagu_scenarios("IC-Garske", "201710gavi-5")

We can then look up the status a particular scenario within this touchstone:

montagu::montagu_scenario_status("IC-Garske", "201710gavi-5", "yf-routine-gavi")

and if that status were not valid, then we can query any problems with:

montagu::montagu_scenario_problems("IC-Garske", "201710gavi-5", "yf-routine-gavi")

Expectations - years, ages, countries and outcomes

For a given touchstone, modelling groups are required to produce burden estimates stratified by calendar year, and by age. The expectations about time and age will be consistent across scenarios. We can look up these expectations like so:

montagu::montagu_expectations("IC-Garske", "201710gavi-5")

Some modelling groups model more than one disease, in which case a row appears for each disease. And additionally, if there is some reason to have different expectations for different scenarios, then multiple lines for the same disease may appear, with a difference in the description column indicating the scenario.

If we already knew the expectation id, we could look up a single expectation with

montagu::montagu_expectation("IC-Garske", "201710gavi-5", 30)

The countries for which estimates are required can vary between disease, and between scenarios. To retrieve the list of countries that are required for a particular expectation:-

countries <- montagu::montagu_expectation_countries("IC-Garske", "201710gavi-5", 30)
head(countries)

To see which outcomes are required for an expectation:

montagu::montagu_expectation_outcomes("IC-Garske", "201710gavi-5", 30)

And to confirm which scnearios an expectation applies to (which should align with the description text for the expectation):

montagu::montagu_expectation_applicable_scenarios("IC-Garske", "201710gavi-5", 30)

Demography

Montagu provides standardised demographic data necessary for running vaccine models. In the early stages of VIMC, we surveyed the groups to ascertain what demographic data they required, and Montagu provides the superset of all those data. If, however, your model needs demographic data not provided by Montagu, get in touch, as we want to make sure that demography is always consistent between different models.

We can list the available demographic data associated with a touchstone like this:-

montagu::montagu_demographics_list("201710gavi-5")

We can then retrieve a particular demographic statistic as follows:-

cbr <- montagu::montagu_demographic_data("cbr", "201710gavi-5")
head(cbr)

Gender

The example above, the central birth rate, is not gender-specific. Other fields are gender-specific, where data can be downloaded for 'female', 'male', or the total of all people is called 'both'. By default 'both' is returned, but to select a specific gender:

female_pop <- montagu::montagu_demographic_data("tot_pop", "201710gavi-5",
                                                gender_code = "female")
head(female_pop)

Format

Data is also available in long format (the default), in which years and ages are reported in separate rows, or wide format, in which the years of time are represented as columns, making the resulting data shorter, but wider. For example:-

as_fert <- montagu::montagu_demographic_data("as_fert", "201710gavi-5")
names(as_fert)
nrow(as_fert)
as_fert_wide <- montagu::montagu_demographic_data("as_fert", "201710gavi-5",
                                                  wide = TRUE)
names(as_fert_wide)
nrow(as_fert_wide)

Templates

The above will give all the information required to build your own data frame for submitting the expected results. Alternatively, you can download burden estimate templates with the years, countries and ages already populated:-

df <- montagu::montagu_central_burden_estimate_template("IC-Garske", "201710gavi-5", 30)
head(df)

Coverage

We now need to obtain the coverage data relevant for our disease and touchstone, for each of the scenarios we're going to model. Let's look at the yf-routine-gavi coverage metadata.

montagu::montagu_coverage_info("IC-Garske", "201710gavi-5", "yf-routine-gavi")

And now let's retrieve the coverage data itself, and summarise it.

cov <- montagu::montagu_coverage_data("IC-Garske", "201710gavi-5", "yf-routine-gavi")
head(cov[cov$coverage > 0, ])
range(cov$year)
length(unique(cov$country_code))
nrow(cov)
ncol(cov)

More information on the coverage is available in the Montagu Web Portal, but briefly:

Long or Wide format

The data above is formatted in what we call long format, in which one year is reported per row. Alternatively, you can specify:

cov <- montagu::montagu_coverage_data("IC-Garske", "201710gavi-5", "yf-routine-gavi", format="wide")
names(cov)[1:20]
nrow(cov)
ncol(cov)

Here, we have one row per country, and separate columns for coverage_year and target_year.

Returning the full range of countries

By default, the coverage data returned is limited to those countries for which burden estimate data is required for the given scenario. HepB is a particular example, where different sets of countries are relevant in different scenarios, because the birth-dose vaccination is not applied in all countries where there is HepB vaccination. However, some groups prefer to model all of these scenarios similarly, with the same numbers of countries, hence they require the full coverage data for all countries. And for some diseases, the historic effects of vaccination in a country that is not required for estimates as present, may have relevance in their model to countries that are required.

For those cases, where all countries are required, the optional all_countries argument can be used.

Burden Estimate Sets

The burden estimate set defines the results of the models, which modelling groups submit back to Montagu. To view the burden estimate sets already uploaded, for a group, touchstone and scenario:-

head(montagu::montagu_burden_estimate_sets("IC-Garske", "201710gavi-5", "yf-no-vaccination"))

If we know the burden estimate set id, we can also get a list of information about a burden estimate set with:

montagu::montagu_burden_estimate_set_info("IC-Garske", "201710gavi-5", "yf-no-vaccination", 687)

If there are any reported problems with a burden estimate set, we can examine those with

montagu::montagu_burden_estimate_set_problems("IC-Garske", "201710gavi-5", "yf-no-vaccination", 687)

and we can retrieve the data for the burden estimate set with:

data <- montagu::montagu_burden_estimate_set_data("IC-Garske", "201710gavi-5", "yf-no-vaccination", 687)
head(data)

Creating a new burden estimate set

A new burden estimate set can be created as follows:-

bsid <- montagu::montagu_burden_estimate_set_create(
    "IC-Garske", "201710gavi-5", "yf-no-vaccination", "central-averaged",
    details = "Details about model")

The fourth parameter, the type, is either: central-averaged - for a central estimate that is calculated by the average of a number of runs central-single-run - for a stand-alone central estimate run.

The result of the burden estimate set creation is the id of the newly created burden estimate set, which must be given when we later upload results.

Uploading data

Suppose we have now populated the appropriate burden estimate template with results, and we want to upload it. If results is our completed template, then the simplest upload function call is:-

montagu::montagu_burden_estimate_set_upload(
    "IC-Garske", "201710gavi-5", "yf-routine-gavi", bsid, results)

This will upload the completed data, and close the burden estimate set.

Plotting previous burden estimate sets

One further function call useful for plotting, allows retrieving a particular burden outcome from a burden estimate set, summed across all countries, and dis-aggregated by either age or by year. The result will be a sequence of sets of (x,y) points, suitable for straightforward plotting. For example:

data <- montagu::montagu_burden_estimate_set_outcome_data("IC-Garske", "201710gavi-5", 
         "yf-no-vaccination", 687, "cases", group_by = "age")
head(data)
plot(data$x[data$age==20], data$y[data$age==20], xlab="year", ylab="cases", main="For age 20")

Or to dis-aggregate by year:-

data <- montagu::montagu_burden_estimate_set_outcome_data("IC-Garske", "201710gavi-5", 
         "yf-no-vaccination", 687, "cases", group_by = "year")
head(data)
plot(data$x[data$year==2040], data$y[data$year==2040], xlab="age", ylab="cases", main="For year 2040")

Stochastic Runs

In order to establish confidence intervals, modelling groups are sometimes asked to run a stochastic ensemble of models. This may be, for instance, 200 instances of a model, with different parameters for each instance. We would call the instance number, between 1 and 200, the run_id, and there would be a unique set of parameters for each.

It is important that for a given run_id, all the scenarios are run with the matching set of parameters for that run_id, so that we can calculate impacts by comparing scenarios.

It is also important that the spread of parameters chosen across the 200 runs significantly captures the range of sensible behaviour of the model, and that the central estimates submitted are somewhere near the centre of the spread of parameters.

Lastly, it is desirable to vary as few parameters as possible, so that with 200 runs, we can see how the parameter changes affect the model outcomes. If there are too many parameters, it may be difficult to ascertain which parameters are causing the changes, or whether the spread of parameters really does capture the significant behaviour of the model.

The R client cannot currently support the full upload process of stochastic runs. Because of their large volume, and the international locations of modelling groups, dropbox provides a more robust service for uploading. It is possible, however, to work with model run parameter sets, and using the Montagu client will make it easier to produce the stochastic results (perhaps directly into a local dropbox folder), in a systematic way.

Model Run Parameter Sets

The model run parameter set has a column for run_id, and another column for each parameter being varied. For example:

set.seed(12345)
params <- data.frame(run_id = 1:5, param_1 = sample(5), param_2 = sample(5))
params

This can then be uploaded as follows:-

id <- montagu::montagu_model_run_parameter_set_upload(
    "IC-Garske", "201710gavi-5", "YF", params)

The id of the newly created parameter set will be returned.

We can also list the existing model run parameter sets with:-

head(montagu::montagu_model_run_parameter_sets("IC-Garske", "201710gavi-5"))

or for a list of information about a single parameter set:

montagu::montagu_model_run_parameter_set_info("IC-Garske", "201710gavi-5", 20)

Finally, if we want to retrieve the parameter values for a previous set:-

data <- montagu::montagu_model_run_parameter_set_data("IC-Garske", "201710gavi-5", 20)
data[1:5,1:5]

Appendix

Models

Montagu allows querying of basic information about all models, past and present used in VIMC. These can be listed as follows:-

models <- montagu::montagu_models()
head(models)

and a list of information about a particular model by its id can be retrieved like this:-

montagu::montagu_model("YFIC")

Note that the model_id (id column above) is different from the modelling_group_id, and generally the modelling_group_id is the significant identifier required for other Montagu functions.

Demographic Sources

Occasionally, there may be multiple instances of the same demographic data field, but for different sources. For example, infant mortality is provided by both UNWPP, and IGME (ChildMortality.org). Neither sources are ideal, and Montagu provides a documented compromise between the two sources for infant, and especially neonatal mortality.

If an example arises where there are two instances of the same data field, then they will have a different data_source, which will need to be specified in the form:-

montagu::montagu_demographic_data("Modelling-Group", "Touchstone", 
                                  source_code = "The-Source-Code")

where the The-Source-Code is in the table returned by

montagu::montagu_demographics_list("Touchstone")


vimc/montagu-r documentation built on Oct. 10, 2019, 9:10 p.m.