In mgunther87/ipumsPMA: Common functions for IPUMS PMA staff

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(ipumsPMA)
library(kableExtra)
inv <- paste0(
  py$project_to_path("pma"),
  "/admin/ODK_files",
  "/ODK_inventory.csv")%>%
  read_csv()%>%
  as_tibble()

PMA enumerator documents contain the full text of each quesntionnaire, plus XML markup tags that are used to identify the text of each question recorded in the DDOC1, DTAG1, JDOC1, and JTAG1 columns of each sample's data dictionary.

We create these documents with the function enum_make, which reads an R object created by another function odk_get. Using the operator %>% allows us to pass the results of the latter to the former:

odk_get("bf2018a_nh")%>%enum_make()

As a result of this function, a .txt file will be created in the same location as the ODK file referenced by odk_get. Now, all that's left is for you to do is:

1) Manually save the file as a .doc file, 2) In Word, use the IPUMS macro format from tags to make it pretty (and create numbered XML tags for each question) 3) Change the file name as desired 4) Move the file to the enumerator documents folder

More info on ODK files

ODK file inventory

ODK files are Excel files containing the programming logic responsible for rendering the survey on devices used by enumerators in the field. We store them in the PMA admin folder, but this folder has sprouted a number of subfolders as the project has grown. We maintain an inventory of the contents of our ODK subfolders at pma/admin/ODK_files/ODK_inventory.csv

It looks like this:

inv%>%
  kable("html")%>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    fixed_thead = T
  )%>%
  scroll_box(height = "500px")

Notice that there is a default Path for every sample, but some samples have additional paths listed in columns to the right. The function odk_get takes an argument, survey, that can specify one of these other paths.

For example, suppose we're working with the sample bf2019a_hh. By default, odk_get returns the ODK file located in the Path column. In this case, we're working with a person-level sample (as opposed to, say, a service delivery point sample), so the default Path points to the household questionnaire:

odk_get(sample = "bf2019a_hh")

We can get the female questionnaire associated with bf2019a_hh by specifying a different column by name (argument names are shown here for readability, but are not required):

odk_get(sample = "bf2019a_hh", survey = "female")

Some samples contain multiple survey rounds, which are stored in r1, r2, and so on. For example, the 2018 MNH survey from Ethiopia:

odk_get(sample = "et2016a_mn", survey = "r1")
odk_get(sample = "et2016a_mn", survey = "r2")

New columns can be added to the ODK inventory spreadsheet at any time, and they will become immediately available to odk_get.

Using ODK files in R

Sometimes, it may be useful reference ODK files in R for reasons other than creating enumerator documents. For example, on the sheet called survey in each file, there should be a column relevant that shows code reflecting the universe logic for each question: this is very handy if you're drafting universe statements.

One way to access this information is to use odk_get to open the file in Excel (saving you the trouble of digging through the ODK file folder):

odk_get("bf2018a_hh", open = T)

But, if you're going through several ODK files all at the same time, dealing with multiple open Excel files can be tedious. Instead, you can use odk_get to reference the information within R. Access the survey sheet with the $ operator:

odk <- odk_get("bf2018a_hh")
surv <- odk$survey

The result is a tibble, which you can query just like any other dataset. Suppose you want to know the universe for the mnemonic handwashing_place_observations: use the funtion filter to find the row for this variable, and then select the column relevant to see the universe logic:

surv%>%
  filter(name == "handwashing_place_observations")%>%
  select(relevant)

Looks like a respondent only received this question if the prior question, handwashing_place_rw was answered with observed_fixed or observed_mobile. If you'd like more explanation on what these values mean, you can usually find it on the choices sheet: it's the second tibble returned by odk_get. You'll find a connection between the contents of the type column on the survey sheet, and the list_name column on the choices sheet:

surv%>%
  filter(name == "handwashing_place_rw")%>%
  select(type)

The text "select_one" tells us that one choice could have been selected from a list of choices, and the text "handwash_list" refers to the name of a particular list of chioces on the choices sheet.

odk$choices%>%
  filter(list_name == "handwash_list")%>%
  select(name, label..English)

This shows us more information about the universe, still: instead of writing that the handwashing_place_observations was given to "any household with either a fixed or mobile place for handwashing was observed", we now see that these options comprise all of the "observed" options: a better choice would be "any household where the interviewer observed a place for handwashing".

Working with multiple ODK files

odk_get is most powerful when you'd otherwise find yourself working with multipe open Excel files. Instead, if you're planning to look for a mnemonic in lots of different samples at once (perhaps looking for the universe logic for each sample), try using the map function to iterate through each of your samples simultaneously:

my_samples <- c("bf2017a_nh", 
                "bf2018a_nh", 
                "ke2017a_nh", 
                "ke2018a_nh")

my_odks <- map(my_samples, odk_get)%>%
  set_names(my_samples)

map(my_odks, ~{
  .x$survey%>%
    filter(name == "handwashing_place")%>%
    select(relevant)
})

Here, my_odks is a list containing the results of odk_get for each of my 4 samples. Any one of the items can be referenced by name (as in my_odks$bf2017a_nh) and also sub-referenced (as in my_odks$bf2017a_nh$survey). Instead of doing that, I use map a second time to apply a lambda function to each of the 4 members of my_odks: the lambda function fits within the brackets ~{}, and passes the name of each list item as .x. Oterwise, the filter & selection process works the same as before.

map returns the result of each lambda function in a handy list. Notice that the mnemonic handwashing_place only appears verbatim in 2 samples, but they both have the same universe logic. (FYI: it is possible that the 2018 samples have the same question, but used a slightly different name.)

mgunther87/ipumsPMA documentation built on Aug. 1, 2020, 12:22 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com