gen_data: Generate random linelist or survey data

View source: R/gen_data.R

gen_dataR Documentation

Generate random linelist or survey data

Description

Based on a dictionary generator like msf_dict() or msf_dict_survey(), this function will generate a randomized data set based on values defined in the dictionaries. The randomized dataset produced should mimic an excel export from DHIS2 for outbreaks and a Kobo export for surveys.

Usage

gen_data(
  dictionary,
  varnames = "data_element_shortname",
  numcases = 300,
  org = "MSF"
)

Arguments

dictionary

Specify which dictionary you would like to use.

varnames

Specify name of column that contains variable names. If dictionary is a survey, varnames needs to be "name"'.

numcases

Specify the number of cases you want (default is 300)

org

the organization the dictionary belongs to. Currently, only MSF exists. In the future, dictionaries from WHO and other organizations may become available.

Value

a data frame with cases in rows and variables in columns. The number of columns will vary from dictionary to dictionary, so please use the dictionary functions to generate a corresponding dictionary.

Examples


if (require("dplyr") & require("matchmaker")) {
  withAutoprint({

    # You will often want to use MSF dictionaries to translate codes to human-
    # readable variables. Here, we generate a data set of 20 cases:
    dat <- gen_data(
      dictionary = "Cholera",
      varnames = "data_element_shortname",
      numcases = 20,
      org = "MSF"
    )
    print(dat)

    # We want the expanded dictionary, so we will select `compact = FALSE`
    dict <- msf_dict(disease = "Cholera", long = TRUE, compact = FALSE, tibble = TRUE)
    print(dict)

    # Now we can use matchmaker to filter the data:
    dat_clean <- matchmaker::match_df(dat, dict,
      from = "option_code",
      to = "option_name",
      by = "data_element_shortname",
      order = "option_order_in_set"
    )
    print(dat_clean)

  })
}

R4EPI/epidict documentation built on Aug. 31, 2022, 5:34 a.m.