knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ## set global chunk options options(formatR.arrow=TRUE,width=80)
## Make sure you attach the package !! library(FPEMglobal)
You can access this vignette from the R console by typing vignette("FPEMglobal_Intro", package = "FPEMglobal")
.
FPEMglobal (Family Planning Estimation Model global) implements a Bayesian hierarchical model for estimating and projecting the prevalence of contraceptive use and unmet need at the national level. This vignette concerns the application to all women of reproductive age (i.e., ages 15--49 years); for application to adolescent women aged 15--19 see the vignette "FPEMglobal_Age_1519".
The package consists of submodels which produce estimates and projections for married and unmarried women. Together, these are used to produce results for all women. The married women model is described in @alkema_national_2013. Updates to the model are described in @cahill_modern_2017; these are included in FPEMglobal. The unmarried women model, and the process of producing estimates for all women of reproductive age, is described in @kantorova18_contr. FPEMglobal is a direct descendant of ContraceptiveUse [@AlkemaContraceptiveUse2014].
cite_entry <- citation("FPEMglobal") cat(format(cite_entry, style = { if (knitr::is_html_output()) "HTML" else if (knitr::is_latex_output()) "latex" else "textVersion" }))
See also citation("FPEMglobal")
from within R.
Estimation and projection is done separately for married and unmarried women. The results of these separate runs are then combined to produce estimates for all women regardless of marital status. For each marital group, a Monte Carlo Markov chain (MCMC) sample from the joint posterior distribution is produced using JAGS. This is then transformed and summarized to yield estimates and projections of the indicators of interest. This procedure has the following structure:
There are separate user functions for each of these steps. For example, do_global_mcmc
implements steps 1a and 2a. There are also several wrapper functions that perform multiple steps and set defaults for the sub-functions. For example, do_global_run
implements all of steps 1 and 2. The function do_global_all_women_run
implements all steps from 1 to 3. The user-interface functions, and their actions, are summarized in the table below. The 'Level' indicates the level of abstraction from the multi-step process given above. Higher values correspond to more abstraction. Level 0 functions perform a single step in the process, higher levels perform multiple steps.
| Level | Function | Action |
|-------+---------------------------+-------------------------------------------------------------------------------------------------------------|
| 2 | do_global_all_women_run
| Perform a complete run for married, unmarried, and all women, post process results, create summary output. |
| 1 | do_global_run
| Perform a run for either married or unmarried women, post process results, create summary output. |
| 0 | combine_runs
| Combine married and unmarried women runs, as produced by do_global_run
, to get all women MCMC output. |
| 0 | do_global_mcmc
| Run the model for either married or unmarried women (generates MCMC output only). |
| 0 | add_global_mcmc
| Run an additional MCMC chain; must run do_global_mcmc
to get at least one chain first. |
| 0 | post_process_mcmc
| Post-process MCMC output produced by do_global_mcmc
(and add_global_mcmc
). |
| 0 | make_results
| Produce summary results from post-processed MCMC chains produced by post_process_mcmc
or combine_runs
. |
The following utility functions are also available:
| Function | Action |
|-------------------------+-------------------------------------------------------|
| validate_input_file
| Checks input data file for invalid entries |
| compare_runs_CI_plots
| Produces plots that compares two different model runs |
Each run is given a default run name which is used in the names of directories to hold outputs and the output files themselves. It is composed of the date-time the function was called (via format(Sys.time(), "%y%m%d_%H%M%S")
), the marital subpopulation of the run ("married", "unmarried", or "all_women"), the age group ("15-49") and an optional annotation supplied via the run_desc
argument accepted by all of the do_...
functions. For example, a married women run name might be '180802_144802_married_15-49' with no annotation, or '180802_144802_test_married_15-49' with run_desc = "test"
. The output of each function is a character string giving the run name for the run just performed.
For full all women runs done by do_global_all_women_run
, the date-time part of the run names is set to the time the whole run was initiated. Therefore, it is the same for married, unmarried, and all women outputs. This means they sort together in the file explorer.
More than one MCMC chain is required per run. This is controlled by the argument chain_nums
which is passed to the do_...
functions. This must be a sequence, not a single number. For example, if two chains are wanted, write chain_nums = 1:2
; if four are wanted, write chain_nums = 1:4
:
## Example use of 'chain_nums' argument do_global_all_women_run(..., #other arguments chain_nums = 1:4, ... #other arguments )
Each chain can take several hours (even days) so it is quicker to run as many of them as possible in parallel. You gain speed by running up to as many chains in parallel as there are cores on your computer. To use parallel running you must install the package doParallel [@corporation2017] on Windows, or doMC [@calaway2017], on MacOS and UNIX/Linux. Parallel processing is controlled by the argument run_in_parallel
. It is set to TRUE
by default, however if you do not have doParallel
or doMC
installed, the package will automatically switch to running the chains in serial.
## Example use of 'run_in_parallel' argument do_global_all_women_run(..., #other arguments chain_nums = 1:4, run_in_parallel = FALSE, ... #other arguments )
Note for Windows Users: If a run using parallel chains stalls or crashes, or you stop it manually, the 'child' processes may not terminate automatically. In such a situation it is recommended to check the Windows Task Manager for any stray 'Rscript.exe' processes. If present, they can be stopped manually. Alternatively, re-start the computer.
Required and optional input data are listed in the table below. The required input data sets are supplied with the package as '.csv' files. Type 'Est.' (for 'Estimate') indicates an input that consists of survey-based estimates of a demographic indicator; type 'Class.' (for 'Classification') indicates an input that consists of definitions, or classifications, that group countries together or select subsets of countries for particular operations. To access the input files in R see later. The arguments used in functions that require them are listed along with their defaults (second table).
| | Type | Input | Argument |
|----------+--------+--------------------------------------------+-----------------------------------------|
| Required | Est. | Prevalence estimates from national surveys | data_csv_filename
|
| | Est. | Number of women by marital status and age | denominator_counts_csv_filename
|
| | Class. | Country region information | region_information_csv_filename
|
| | Class. | Countries to include in aggregates | countries_for_aggregates_csv_filename
|
| Optional | Class. | Definitions of special aggregates | special_aggregates_name
|
| Argument | Default |
|-----------------------------------------|-------------------------------------|
| data_csv_filename
| data_cp_model_all_women_15-49.csv |
| denominator_counts_csv_filename
| number_of_women_15-49.csv |
| region_information_csv_filename
| country_and_area_classification.csv |
| countries_for_aggregates_csv_filename
| countries_mwra_195.csv |
| special_aggregates_name
| varies |
Sources and citations for the included files are in the next table. The default country region information file ('country-and-area-classification.csv') contains various columns that classify all countries according to certain criteria. These classifications are used during estimation and summarization.
| Input | Source | |--------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------| | Prevalence estimates from national surveys | World Contraceptive Use, 2022 [@un_desa_pd_WCUMA_2022] | | Number of women by marital status and age | Estimates and Projections of Women of Reproductive Age Who Are Married or in a Union 2022 [@undesa_pd2022_estim_projec_women_reprod_age]. | | Country region information | Areas and regions: Standard Country or Area Codes for Statistical Use (M49) [@UN_M49]; sexual activity among unmarried women: [@kantorova18_contr]. |
Default inputs included with the package can be accessed using the system.file()
command. For example:
## Find the file path in the package file tree data_csv_filename <- system.file("extdata", "data_cp_model_all_women_15-49.csv", package = "FPEMglobal") ## Read in the file data_df <- read.csv(data_csv_filename, header = TRUE, as.is = TRUE) ## View the first few columns str(data_df[1:5])
To list all data files included with the package use the following code:
## List of filenames (from table above) list_all_files <- dir(system.file("extdata", package = "FPEMglobal")) ## Compare with the table above list_all_files
New input files can be validated using the validate_input_file
function. By default, the input file included with the package will be validated.
## By default, the input file included with the package will be ## validated. Change 'input_data_folder_path' and 'data_csv_filename' ## to validate a user-supplied file. validate_input_file(marital_group = c("married", "unmarried")) # Both will be checked by default
The main output are files saved to disc. These include files created during model estimation and output files containing plots and tables. By default, these are stored in sub-directories of directory 'output'. For a full run, the sub-directories are named after the run name. For example, if a run is initiated on 17th July 2018 at 11:25:07AM, outputs for married women would be stored in the '20180717_112507_married_15-49' sub-directory of 'output'.
All plots are saved to the 'fig' subdirectory of the main output directory; all tables are saved to 'table'. The file names are constructed by appending a description of the figure or table to the run name.
The main results for family planning indicators are in the files:
The indicators are split across multiple tables. For example, 'output/[run-name]/[run-name]_Country_perc_Modern.csv' contains results for contraceptive prevalence (any modern method) at the country level.
A number of temporary files are produced during a run that can be deleted once it is finished. Some of these take up considerable space. Files in directories named 'temp.JAGSobjects' can be deleted.
| Name | Type | When to Delete |
|-------------------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| temp.JAGSobjects
| Directory | After post_process_mcmc
has completed. Ensure mcmc.array.rda
has been created. |
| countrytrajectories
| Directory | If country-level MCMC trajectories are not needed, this can be deleted after make_results
has completed. |
| aggregatetrajectories
| Directory | If all women aggregate-level MCMC trajectories are not needed, this can be deleted after make_results
has completed. Only created when all women estimates are produced. |
The location of all input files is supplied via the input_data_folder_path
argument. This directory must contain all required input files; multiple input file directories are not supported.
After the runs are complete, the input files are logged by copying them to the directory 'output/run_name
/data'.
The default value of input_data_folder_path
is diffferent depending on the function. The do_...
functions use the directory supplied with the package: system.file("extdata", package = "FPEMglobal")
. However, post_process_mcmc
, combine_runs
and make_results
use the directory 'output/[run-name
]/data'. This ensures that post processing is done on the same inputs used to do the model run. You may need to change the defaults if you are re-processing outputs to, for example, add special aggregates (see below).
A full global run of the model can be done with just one wrapper function, do_global_all_women_run
. Alternatively, several separate functions can be called in sequence if finer control over the inputs and outputs is desired. For default behaviour, it is easiest to use the main wrapper.
The model is estimated using an MCMC algorithm. Recommended values for the number of iterations (total), the number to discard as burn-in, and how to thin the sample, are given in the help file for do_global_all_women_run
. As at 2021, in our experience, a full run comprising 20 chains run in parallel on a 20 core machine takes 36--48 hours.
A full run of the global model for using the default (built-in) input files can be done using a single function: do_global_all_women_run
.
## Read the documentation for 'do_global_all_women_run' ?FPEMglobal::do_global_all_women_run
## Do a global run for all women, including summarization. ## Make sure you attach the package !! library(FPEMglobal) ## Call the main function all_women_runs <- do_global_all_women_run(## Describe the run run_desc = "", age_group = "15-49", data_csv_filename = "data_cp_model_all_women_15-49.csv", ## MCMC parameters. CHANGE FOR FULL RUN! estimation_iterations = 2, burn_in_iterations = 1, steps_before_progress_report = 1, thinning = 1, chain_nums = 1:2, run_in_parallel = TRUE, set_seed_chains = 1 ) ## See the run name !! All women, married women, and unmarried women output ## folders have the same date-time in their names. all_women_runs
The output of the function is the run name as a character string; the results are saved to disc.
To run the global model for each marrital group separately, use the function do_global_run
. This function generates the MCMC samples, post-processes them, and produces output summaries.
### ---------------------------------------------------------------------------- ### Run for unmarried women only unmarried_run_name <- do_global_run(## Describe the run marital_group = "unmarried", age_group = "15-49", data_csv_filename = "data_cp_model_all_women_15-49.csv", ## MCMC parameters. CHANGE FOR FULL RUN! estimation_iterations = 2, burn_in_iterations = 1, steps_before_progress_report = 1, thinning = 1, chain_nums = 1:2, run_in_parallel = TRUE, set_seed_chains = 1 ) ## See the run name unmarried_run_name ### ---------------------------------------------------------------------------- ### Run for married women only married_run_name <- do_global_run(## Describe the run marital_group = "married", age_group = "15-49", data_csv_filename = "data_cp_model_all_women_15-49.csv", ## MCMC parameters. CHANGE FOR FULL RUN! estimation_iterations = 2, burn_in_iterations = 1, steps_before_progress_report = 1, thinning = 1, chain_nums = 1:2, run_in_parallel = TRUE, set_seed_chains = 1 ) ## See the run name married_run_name
To combine unmarried and married women runs to create all women results, use the funtion combine_runs
. This produces MCMC trajectories for all women from the married and unmarried women outputs specified in the arguments married_women_run_output_folder_path
and unmarried_women_run_output_folder_path
.
### ---------------------------------------------------------------------------- ### Combine to make all women results all_women_run_name <- combine_runs( ## Inputs married_women_run_output_folder_path = file.path("output", married_run_name), unmarried_women_run_output_folder_path = file.path("output", unmarried_run_name), ) ## See the run name ## !! This time, not the same date-time as married_run_name or ## !! unmarried_run_name. all_women_run_name make_results(all_women_run_name)
The lowest level of control over output generation is obtained by calling the Level 0 functions.
## Unmarried women MCMC only unmarried_run_name <- do_global_mcmc(marital_group = "unmarried", age_group = "15-49", #data_csv_filename = "data_cp_model_all_women_15-49.csv", ## MCMC parameters. CHANGE FOR FULL RUN! estimation_iterations = 2, burn_in_iterations = 1, steps_before_progress_report = 1, thinning = 1, chain_nums = 1:2, run_in_parallel = TRUE, set_seed_chains = 1 ) ## Look at the run name unmarried_run_name ## Post-process MCMC output post_process_mcmc(run_name = unmarried_run_name) ## Produce plots, tables make_results(unmarried_run_name)
In the following sections two additional types of outputs are discussed: special aggregates and adjusted medians. The nature of these outputs is explained in the respective sections. The table below gives a summary of the function call sequences needed to produce various combinations of outputs from a completed model run, including these two additional types. The completed run could have been obtained from any of the do_...
functions. In the example calls, default values of arguments are assumed and are not explicitly specified. It is further assumed that the appropriate run_name
s are passed as the first argument to all functions.
+-----------------------+--------------------------------+--------------------------------------------------------------------------------+
| Marital Group | Output | Function call sequence |
+=======================+================================+================================================================================+
| Married and unmarried | Standard aggregates | post_process_mcmc
\ |
| | | make_results
|
+-----------------------+--------------------------------+--------------------------------------------------------------------------------+
| | Special aggregates | post_process_mcmc(..., special_aggregates_name = "[name]")
\ |
| | | make_results(..., special_aggregates_name = "[name]")
|
+-----------------------+--------------------------------+--------------------------------------------------------------------------------+
| | Adj. medians, country level | post_process_mcmc
\ |
| | | make_results(..., adjust_medians = TRUE)
|
+-----------------------+--------------------------------+--------------------------------------------------------------------------------+
| | Adj. medians, std. aggregates | post_process_mcmc
\ |
| | | make_results(..., adjust_medians = TRUE)
|
+-----------------------+--------------------------------+--------------------------------------------------------------------------------+
| | Adj. medians, spec. aggregates | post_process_mcmc(..., special_aggregates_name = "[name]")
\ |
| | | make_results(..., adjust_medians = TRUE, special_aggregates_name = "[name]")
|
+-----------------------+--------------------------------+--------------------------------------------------------------------------------+
| All women | Standard aggregates | combine_runs
\ |
| | | make_results
|
+-----------------------+--------------------------------+--------------------------------------------------------------------------------+
| | Special aggregates | combine_runs(..., special_aggregates_name = "[name]")
\ |
| | | make_results(..., special_aggregates_name = "[name]")
|
+-----------------------+--------------------------------+--------------------------------------------------------------------------------+
| | Adj. medians, country level | Adj. medians for married and unmarried \ |
| | | combine_runs(..., adjust_medians = TRUE)
\ |
| | | make_results(..., adjust_medians = TRUE)
|
+-----------------------+--------------------------------+--------------------------------------------------------------------------------+
| | Adj. medians, std. aggregates | Adj. medians for married and unmarried \ |
| | | combine_runs(..., adjust_medians = TRUE)
\ |
| | | make_results(..., adjust_medians = TRUE)
|
+-----------------------+--------------------------------+--------------------------------------------------------------------------------+
| | Adj. medians, spec. aggregates | Adj. medians for married and unmarried \ \|
| | | combine_runs(..., adjust_medians = TRUE, special_aggregates_name = "[name]")
\|
| | | make_results(..., adjust_medians = TRUE, special_aggregates_name = "[name]")
|
+-----------------------+--------------------------------+--------------------------------------------------------------------------------+
Custom country aggregates can be requested if the ones produced by default are not adequate. Use the argument special_aggregates_name
. This must be a name for the aggregate that can be used in the filenames of the output plots and tables. There must be a '.csv' file for each aggregate requested, each file having the same name as the aggregate.
A '.csv' file describing an aggregate must have two columns: 'iso.country', and 'groupname'. As an example, the first seven rows of such a table are shown below. The first column ('A') lists country ISO codes, the second ('B') lists the group they belong to. The file should contain each ISO code only once and should be saved in '.csv' format.
| | A | B | |---+-------------+-----------------------| | 1 | iso.country | groupname | | 2 | 4 | Eastern Mediterranean | | 3 | 8 | Europe | | 4 | 12 | Africa | | 5 | 20 | Europe | | 6 | 24 | Africa | | 7 | 28 | Americas | | | ...etc | ...etc |
special_aggregates_name
can be passed to do_global_all_women_run
or do_global_run
. The '.csv' file(s) describing the aggregate must be in the folder pointed to by the argument input_data_folder_path
. You can specify more than one '.csv' it there are multiple aggregation schemes you need results for.
The code below illustrates how special aggregates for the WHO regions can be requested. The package comes with the file 'WHO_regions.csv' (access this with system.file("extdata", "WHO_regions.csv", package = "FPEMglobal")
) which defines WB country aggregates. The name of the aggregate to be passed to the functions is "WHO_regions".
## Special aggregates at model run time. all_women_runs <- do_global_all_women_run(## Describe the run run_desc = "", age_group = "15-49", data_csv_filename = "data_cp_model_all_women_15-49.csv", ## MCMC parameters. CHANGE FOR FULL RUN! estimation_iterations = 2, burn_in_iterations = 1, steps_before_progress_report = 1, thinning = 1, chain_nums = 1:2, run_in_parallel = TRUE, set_seed_chains = 1, ## Special aggregates name special_aggregates_name = "WHO_regions" #<<< no ".csv"; just the name )
If you want to add special aggregates to an already completed run you must do this separately for each marital group. For married and unmarried women runs, call post_process
and make_results
again on the output, but add the argument special_aggregates_name
, giving the names of the aggregates to add (you must pass this argment to both post_process
and make_results
). If you want to add special aggregates to an all women run, you must first add them to both the married and unmarried women runs, then call combine_runs
to combine them and make_results
to generate the results. Use the argument run_name_override
to specify the name of the existing all women output directory when calling combine_runs
. You must pass the special_aggregates_name
argument to combine_runs
and make_results
.
Note that the default value for input_data_folder_path
in post_process_mcmc
is the 'data' directory inside the run directory. For example, if the completed run is 'myrun_all_women', the default is 'myrun_all_women/data' directory. It is recommended to copy the aggregate '.csv' file(s) to this directory, rather than change the value of input_data_folder_path
.
See the example at the end of the section 'Adjusted Medians' for R
code. You can omit the adjusted medians in that example if you do not want them.
To overwrite aggregates in an already completed run, you must first delete the following files, and then call post_process
and make_results
:
See the example at the end of the section 'Adjusted Medians' for R
code. You can omit the adjusted medians and special aggregates if you do not want them.
If you want to use a revised set of denominator counts, you should follow the steps in the 'Overwriting Aggregates' section. However, you should delete the entire all women output directory rather than the few files specified.
In addition, you will need to place copies of the denominator counts '.csv' file in the 'data' subdirectories of all runs you want to re-process.
The posterior joint distributions for the estimates and projections that are produced by default are summarized using marginal quantiles, such as medians, 0.025, and 0.975 quantiles. "Demographic identities", for example the sum of modern and traditional prevalences equalling prevalence of any method, will not necessarily be satisfied by medians (or any other quantiles) of this type. If desired, additional copies of the output tables can be produced such that the medians do satisfy demographic identities. Moreover, aggregates of country-level indicators also satisfy the condition that the sum of constituent country medians equals the aggregate medians.
Median adjustment for married and unmarried women runs is done at the stage outputs are summarized as plots and tables. That is, make_results
produces adjusted medians, not post_process_mcmc
. For all women results, combine_runs
is also involved.
Adjusted medians can be requested via the argument adjust_medians
(default value is FALSE
). You can pass adjust_medians = TRUE
to do_global_all_women_run
, or do_global_run
if you are doing a new run, or make_results
and combine_runs
if you are adding them to a completed run.
## Do a full global run, and produce adjusted medians all_women_runs_adj_medians <- do_global_all_women_run(## Describe the run run_desc = "", age_group = "15-49", data_csv_filename = "data_cp_model_all_women_15-49.csv", ## MCMC parameters. CHANGE FOR FULL RUN! estimation_iterations = 2, burn_in_iterations = 1, steps_before_progress_report = 1, thinning = 1, chain_nums = 1:2, run_in_parallel = TRUE, set_seed_chains = 1, ## Adjust medians adjust_medians = TRUE )
To get adjusted medians for a completed married or unmarried women run you must re-create the outputs using make_results
and pass adjust_medians = TRUE
. The adjusted results will be in the subdirectory 'table/adj' of the output directory. Unadjusted results will be recreated in subdirectory 'table/orig'. Any exising results in 'table/' that are no longer needed will need to be deleted manually.
To get adjusted medians for all women, you must first create them for the married and unmarried runs, re-run combine_runs
with adjust_medians = TRUE
, and finally re-generate the all women results with make_results(..., all_women = TRUE)
.
## Add adjusted medians to a completed run (note that all plots and tables will be ## recreated). make_results(all_women_runs$married_run_name, #<<< Give the run name ## Adjust medians adjust_medians = TRUE ) make_results(all_women_runs$unmarried_run_name, #<<< Give the run name ## Adjust medians adjust_medians = TRUE ) ## All women ## You must re-run `combine_runs(..., adjust_medians = TRUE)`. ## Use 'run_name_override' to get them in the existing output folder. all_women_adj_med <- combine_runs(married_women_run_output_folder_path = file.path("output", all_women_runs$married_run_name), unmarried_women_run_output_folder_path = file.path("output", all_women_runs$unmarried_run_name), run_name_override = all_women_runs$all_women_run_name, adjust_medians = TRUE) ## The same all_women_adj_med all_women_runs$all_women_run_name make_results(all_women_adj_med, #<<< Give the run name ## Adjust medians adjust_medians = TRUE )
You can also get adjusted medians for special aggregates. Remember to run post_process_mcmc()
and combine_runs()
before make_results()
. Use run_name_override
in combine_runs
when creating the all women results, otherwise a new folder for all women will be created.
## Adjusted medians married and unmarried. ## The special aggregates file will be searched for in ## 'output/[run-name]/data'. Either copy it there or specify the ## location. post_process_mcmc(all_women_runs$married_run_name, input_data_folder_path = system.file("extdata", package = "FPEMglobal"), special_aggregates_name = "WHO_regions") make_results(all_women_runs$married_run_name, special_aggregates_name = "WHO_regions", adjust_medians = TRUE) post_process_mcmc(all_women_runs$unmarried_run_name, input_data_folder_path = system.file("extdata", package = "FPEMglobal"), special_aggregates_name = "WHO_regions") make_results(all_women_runs$unmarried_run_name, special_aggregates_name = "WHO_regions", adjust_medians = TRUE) ## Add adjusted medians to a completed run with special aggregates (note that ## all plots and tables will be recreated). ## You must re-run `combine_runs()` ## Use 'run_name_override' to get them in the existing output folder, ## otherwise a new folder for all women will be created. all_women_spec_agg <- combine_runs(married_women_run_output_folder_path = file.path("output", all_women_runs$married_run_name), unmarried_women_run_output_folder_path = file.path("output", all_women_runs$unmarried_run_name), run_name_override = all_women_runs$all_women_run_name, special_aggregates_name = "WHO_regions", adjust_medians = TRUE) make_results(all_women_spec_agg, special_aggregates_name = "WHO_regions", adjust_medians = TRUE, all_women = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.