knitr::opts_chunk$set( message = FALSE, warning = FALSE, error = FALSE, collapse = TRUE, comment = "#>" )
library(oldr)
The RAM-OP Workflow is summarised in the diagram below.
knitr::include_graphics("figures/ramOPworkflow.png")
The oldr
package provides functions to use for all steps after data collection. These functions were developed specifically for the data structure created by the EpiData or the Open Data Kit collection tools. The data structure produced by these collection tools is shown by the dataset testSVY
included in the oldr
package.
testSVY
Once RAM-OP data is collected, it will need to be processed and recoded based on the definitions of the various indicators included in RAM-OP. The oldr
package provides a suite functions to perform this processing and recoding. These functions and their syntax can be easily remembered as the create_op_
functions as their function names start with the create_
verb followed by the op_
label and then followed by an indicator or indicator set specific identifier or short name. Finally, an additional tag for male
or female
can be added to the main function to provide gender-specific outputs.
Currently, a standard RAM-OP can provide results for the 13 indicators or indicator sets for older people. The following table shows these indicators/indicator sets alongside the functions related to them:
Indicator / Indicator Set | Related Functions
:--- | :---
Demography and situation | create_op_demo
; create_op_demo_males
; create_op_demo_females
Food intake | create_op_food
; create_op_food_males
; create_op_food_females
Severe food insecurity | create_op_hunger
; create_op_hunger_males
; create_op_hunger_females
Disability | create_op_disability
; create_op_disability_males
; create_op_disability_females
Activities of daily living | create_op_adl
; create_op_adl_males
; create_op_adl_females
Mental health and well-being | create_op_mental
; create_op_mental_males
; create_op_mental_females
Dementia | create_op_dementia
; create_op_dementia_males
; create_op_dementia_females
Health and health-seeking behaviour | create_op_health
; create_op_health_males
; create_op_health_females
Sources of income | create_op_income
; create_op_income_males
; create_op_income_females
Water, sanitation, and hygiene | create_op_wash
; create_op_wash_males
; create_op_wash_females
Anthropometry and anthropometric screening coverage | create_op_anthro
; create_op_anthro_males
; create_op_anthro_females
Visual impairment | create_op_visual
; create_op_visual_males
; create_op_visual_females
Miscellaneous | create_op_misc
; create_op_misc_males
; create_op_misc_females
A final function in the processing and recoding set - create_op
- is provided to perform the processing and recoding of all indicators or indicator sets. This function allows for the specification of which indicators or indicator sets to process and recode which is useful for cases where not all the indicators or indicator sets have been collected or if only specific indicators or indicator sets need to be analysed or reported. This function also specifies whether a specific gender subset of the data is needed.
For a standard RAM-OP implementation, this step is performed in R as follows:
## Process and recode all standard RAM-OP indicators in the testSVY dataset create_op(svy = testSVY)
which results in the following output:
## Process and recode all standard RAM-OP indicators in the testSVY dataset create_op(svy = testSVY)
Once data has been processed and appropriate recoding for indicators has been performed, indicator estimates can now be calculated.
It is important to note that estimation procedures need to account for the sample design. All major statistical analysis software can do this (details vary). There are two things to note:
The RAM-OP sample is a two-stage sample. Subjects are sampled from a small number of primary sampling units (PSUs).
The RAM-OP sample is not prior weighted. This means that per-PSU sampling weights are needed. These are usually the populations of the PSU.
This sample design will need to be specified to statistical analysis software being used. If no weights are provided, then the analysis may produce estimates that place undue weight to observations from smaller communities with confidence intervals with lower than nominal coverage (i.e. they will be too narrow).
The oldr
package uses blocked weighted bootstrap estimation approach:
Blocked : The block corresponds to the PSU or cluster.
Weighted : The RAM-OP sampling procedure does not use population proportional sampling to weight the sample prior to data collection as is done with SMART type surveys. This means that a posterior weighting procedure is required. The standard RAM-OP software uses a “roulette wheel” algorithm to weight (i.e. by population) the selection probability of PSUs in bootstrap replicates.
A total of m
PSUs are sampled with-replacement from the survey dataset where m
is the number of PSUs in the survey sample. Individual records within each PSU are then sampled with-replacement. A total of n
records are sampled with-replacement from each of the selected PSUs where n
is the number of individual records in a selected PSU. The resulting collection of records replicates the original survey in terms of both sample design and sample size. A large number of replicate surveys are taken (the standard RAM-OP software uses $r = 399$ replicate surveys but this can be changed). The required statistic (e.g. the mean of an indicator value) is applied to each replicate survey. The reported estimate consists of the 50th (point estimate), 2.5th (lower 95% confidence limit), and the 97.5th (upper 95% confidence limit) percentiles of the distribution of the statistic observed across all replicate surveys. The blocked weighted bootstrap procedure is outlined in the figure below.
knitr::include_graphics(path = "https://rapidsurveys.io/ramOPmanual/figures/bbw.png")
The principal advantages of using a bootstrap estimator are:
Bootstrap estimators work well with small sample sizes.
The method is non-parametric and uses empirical rather than theoretical distributions. There are no assumptions of things like normality to worry about.
The method allows estimation of the sampling distribution of almost any statistic using only simple computational methods.
The prevalence of GAM, MAM, and SAM are estimated using a PROBIT estimator. This type of estimator provides better precision than a classic estimator at small sample sizes as discussed in the following literature:
World Health Organisation, Physical Status: The use and interpretation of anthropometry. Report of a WHO expert committee, WHO Technical Report Series 854, WHO, Geneva, 1995
Dale NM, Myatt M, Prudhon C, Briend, A, “Assessment of the PROBIT approach for estimating the prevalence of global, moderate and severe acute malnutrition from population surveys”, Public Health Nutrition, 1–6. https://doi.org/10.1017/s1368980012003345, 2012
Blanton CJ, Bilukha, OO, “The PROBIT approach in estimating the prevalence of wasting: revisiting bias and precision”, Emerging Themes in Epidemiology, 10(1), 2013, p. 8
An estimate of GAM prevalence can be made using a classic estimator:
$$ \text{prevalence} ~ = ~ \frac{\text{Number of respondents with MUAC < 210}}{\text{Total number of respondents}} $$
On the other hand, the estimate of GAM prevalence made from the RAM-OP survey data is made using a PROBIT estimator. The PROBIT function is also known as the inverse cumulative distribution function. This function converts parameters of the distribution of an indicator (e.g. the mean and standard deviation of a normally distributed variable) into cumulative percentiles. This means that it is possible to use the normal PROBIT function with estimates of the mean and standard deviation of indicator values in a survey sample to predict (or estimate) the proportion of the population falling below a given threshold. For example, for data with a mean MUAC of 256 mm and a standard deviation of 28 mm the output of the normal PROBIT function for a threshold of 210 mm is 0.0502 meaning that 5.02% of the population are predicted (or estimated) to fall below the 210 mm threshold.
Both the classic and the PROBIT methods can be thought of as estimating area:
knitr::include_graphics(path = "https://rapidsurveys.io/ramOPmanual/figures/indicators26.png")
The principal advantage of the PROBIT approach is that the required sample size is usually smaller than that required to estimate prevalence with a given precision using the classic method.
The PROBIT method assumes that MUAC is a normally distributed variable. If this is not the case then the distribution of MUAC is transformed towards normality.
The prevalence of SAM is estimated in a similar way to GAM. The prevalence of MAM is estimated as the difference between the GAM and SAM prevalence estimates:
$$ \widehat{\text{GAM prevalence}} ~ = ~ \widehat{\text{GAM prevalence}} - \widehat{\text{SAM prevalence}} $$
The function estimateClassic
in oldr
implements the blocked weighted bootstrap classic estimator of RAM-OP. This function uses the bootClassic
statistic to estimate indicator values.
The estimateClassic
function is used for all the standard RAM-OP indicators except for anthropometry. The function is used as follows:
## Process and recode RAM-OP data (testSVY) df <- create_op(svy = testSVY) ## Perform classic estimation on recoded data using appropriate weights provided by testPSU classicDF <- estimate_classic(x = df, w = testPSU)
This results in (using limited replicates to reduce computing time):
## Process and recode RAM-OP data (testSVY) df <- create_op(svy = testSVY) ## Perform classic estimation on recoded data using appropriate weights provided by testPSU classicDF <- estimate_classic(x = df, w = testPSU, replicates = 9) ## Return results classicDF
The function estimateProbit
in oldr
implements the blocked weighted bootstrap PROBIT estimator of RAM-OP. This function uses the probit_GAM
and the probit_SAM
statistic to estimate indicator values.
The estimateProbit
function is used for only the anthropometric indicators. The function is used as follows:
## Process and recode RAM-OP data (testSVY) df <- create_op(svy = testSVY) ## Perform probit estimation on recoded data using appropriate weights provided by testPSU probitDF <- estimate_probit(x = df, w = testPSU)
This results in (using limited replicates to reduce computing time):
## Process and recode RAM-OP data (testSVY) df <- create_op(svy = testSVY) ## Perform classic estimation on recoded data using appropriate weights provided by testPSU probitDF <- estimate_probit(x = df, w = testPSU, replicates = 9) ## Return results probitDF
The two sets of estimates are then merged using the merge_op
function as follows:
## Merge classicDF and probitDF resultsDF <- merge_op(x = classicDF, y = probitDF) resultsDF
which results in:
## Merge classicDF and probitDF resultsDF <- merge_op(x = classicDF, y = probitDF) resultsDF
Once indicators has been estimated, the outputs can then be used to create relevant charts to visualise the results. A set of functions that start with the verb chart_op_
is provided followed by the indicator identifier to specify the type of indicator to visualise. The output of the function is a PNG file saved in the specified filename appended to the indicator identifier within the current working directory or saved in the specified filename appended to the indicator identifier in the specified directory path.
The following shows how to produce the chart for ADLs saved with filename test appended at the start inside a temporary directory:
chart_op_adl(x = create_op(testSVY), filename = file.path(tempdir(), "test"))
The resulting PNG file can be found in the temporary directory
file.exists(path = file.path(tempdir(), "test.png"))
and will look something like this:
#knitr::include_graphics(path = "man/figures/test.ADL.png") chart_op_adl(x = create_op(testSVY), save_chart = FALSE)
Finally, estimates can be reported through report tables. The report_op_table
function facilitates this through the following syntax:
report_op_table(estimates = resultsDF, filename = file.path(tempdir(), "TEST"))
The resulting CSV file is found in the temporary directory
file.exists(path = file.path(tempdir(), "TEST.csv"))
and will look something like this:
read.csv( file = file.path(tempdir(), "TEST.report.csv"), stringsAsFactors = FALSE )
The oldr
package functions were designed in such a way that they can be piped to each other to provide the desired output. Below we use the base R pipe operator |>
.
testSVY |> create_op() |> estimate_op(w = testPSU, replicates = 9) |> report_op_table(filename = file.path(tempdir(), "TEST"))
This results in a CSV file TEST.report.csv
in the temporary directory
file.exists(file.path(tempdir(), "TEST.report.csv"))
with the following structure:
read.csv( file = paste(tempdir(), "TEST.report.csv", sep = "/"), stringsAsFactors = FALSE )
If the preferred output is a report with combined charts and tables of results, the following piped operations can be performed:
testSVY |> create_op() |> estimate_op(w = testPSU, replicates = 9) |> report_op_html( svy = testSVY, filename = file.path(tempdir(), "ramOPreport") )
which results in an HTML file saved in the specified output directory that looks something like this:
knitr::include_graphics("figures/htmlReport.png")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.