In R4EPI/msfdict: Epidemiology data dictionaries and random data generators

The goal of {epidict} is to provide standardized data dictionaries for use in epidemiological data analysis templates. Currently it supports standardised dictionaries from MSF OCA. This is a product of the R4EPIs project; learn more at https://r4epis.netlify.com

Installation

You can install {epidict} from CRAN:

install.packages("epidict")

Click here for alternative installation options

If there is a bugfix or feature that is not yet on CRAN, you can install it via the {drat} package:

You can also install the in-development version from GitHub using the {remotes} package (but there's no guarantee that it will be stable):

# install.packages("remotes")
remotes::install_github("R4EPI/epidict")

Accessing dictionaries

There are four MSF outbreak dictionaries available in {epidict} based on DHIS2 exports:

Acute Jaundice Syndrome (often suspected to be Hepatitis E) ("AJS")
Cholera/Acute watery diarrhea ("Cholera")
Measles/Rubella ("Measles")
Meningitis ("Meningitis")

You can read more about the outbreak dictionaries at https://r4epis.netlify.com/outbreaks

The dictionary can be obtained via the msf_dict() function, which specifies a dictionary that describes recorded variables (data_element_shortname) in rows and their possible options (if categorical):

Click here for code examples

library("epidict")
msf_dict("AJS")
msf_dict("Cholera")
msf_dict("Measles")
msf_dict("Meningitis")

In addition, there are four MSF survey dictionaries available:

Retrospective mortality and access to care ("Mortality")
Malnutrition ("Nutrition")
Vaccination coverage long form ("vaccination_long")
Vaccination coverage short form ("vaccination_short")

You can read more about the survey dictionaries at https://r4epis.netlify.com/surveys

These are accessible via msf_dict_survey() where the variables are in name. You can also read in your own Kobo (ODK) dictionaries by specifying tempalte = FALSE and then setting name = <path to your .xlsx>.

Click here for code examples

msf_dict_survey("Mortality")
msf_dict_survey("Nutrition")
msf_dict_survey("Vaccination_long")
msf_dict_survey("Vaccination_short")

Generating data

The {epidict} package has a function for generating data that's called gen_data(), which takes three arguments: The dictionary, which column describes the variable names, and how many rows are needed in the output.

Click here for code examples

gen_data("Measles", varnames = "data_element_shortname", numcases = 100, org = "MSF")
gen_data("Vaccination_long", varnames = "name", numcases = 100, org = "MSF")

Cleaning data with the dictionaries

You can use the dictionaries to clean the data via the {matchmaker} package:

Click here for code examples

library("matchmaker")
library("dplyr")

dat <- gen_data(dictionary = "Cholera", 
  varnames = "data_element_shortname",
  numcases = 20,
  org = "MSF"
)
print(dat)

# We want the expanded dictionary, so we will select `compact = FALSE`
dict <- msf_dict(disease = "Cholera", 
  long    = TRUE,
  compact = FALSE,
  tibble  = TRUE
)
print(dict)

# Now we can use matchmaker to filter the data
dat_clean <- matchmaker::match_df(dat, dict, 
  from  = "option_code",
  to    = "option_name",
  by    = "data_element_shortname",
  order = "option_order_in_set"
)
print(dat_clean)