gen_data: Generate random linelist or survey data
In R4EPI/epidict: Epidemiology data dictionaries and random data generators

View source: R/gen_data.R

gen_data

R Documentation

Generate random linelist or survey data

Description

Based on a dictionary generator like msf_dict() or msf_dict_survey(), this function will generate a randomized data set based on values defined in the dictionaries. The randomized dataset produced should mimic an excel export from DHIS2 for outbreaks and a Kobo export for surveys.

Usage

gen_data(
  dictionary,
  varnames = "data_element_shortname",
  numcases = 300,
  org = "MSF"
)

Arguments

`dictionary`	Specify which dictionary you would like to use.
`varnames`	Specify name of column that contains variable names. If `dictionary` is a survey, `varnames` needs to be "name"'.
`numcases`	Specify the number of cases you want (default is 300)
`org`	the organization the dictionary belongs to. Currently, only MSF exists. In the future, dictionaries from WHO and other organizations may become available.

Value

a data frame with cases in rows and variables in columns. The number of columns will vary from dictionary to dictionary, so please use the dictionary functions to generate a corresponding dictionary.

Examples


if (require("dplyr") & require("matchmaker")) {
  withAutoprint({

    # You will often want to use MSF dictionaries to translate codes to human-
    # readable variables. Here, we generate a data set of 20 cases:
    dat <- gen_data(
      dictionary = "Cholera",
      varnames = "data_element_shortname",
      numcases = 20,
      org = "MSF"
    )
    print(dat)

    # We want the expanded dictionary, so we will select `compact = FALSE`
    dict <- msf_dict(disease = "Cholera", long = TRUE, compact = FALSE, tibble = TRUE)
    print(dict)

    # Now we can use matchmaker to filter the data:
    dat_clean <- matchmaker::match_df(dat, dict,
      from = "option_code",
      to = "option_name",
      by = "data_element_shortname",
      order = "option_order_in_set"
    )
    print(dat_clean)

  })
}

R4EPI/epidict documentation built on June 14, 2025, 7:44 a.m.