This is a template which can be used to create a report from a retrospective nutrition survey.
Feedback and suggestions are welcome at the GitHub issues page
Text within <! > will not show in your final document. These comments are used to explain the template. You can delete them if you want.
We used the following definitions for the analysis of the survey results for Weight for Height z-scores (WHZ):
We used the following definitions for the analysis of the survey results for MUAC measurements:
In order to estimate stunting in the surveyed population, we looked at Height for Age z-scores (HAZ) and used the following definitions:
In order to estimate underweight in the surveyed population, we looked at Weight for Age z-scores (WAZ) and used the following definitions:
Exclusion of z-scores from Observed mean SMART flags included:
WHZ: <-5 or >5;
HAZ: <-6 or >6;
* WAZ: <-6 or >5.
## hide all code chunks in the output, but show errors knitr::opts_chunk$set(echo = FALSE, error = TRUE, fig.width = 6*1.25, fig.height = 6) # Ensures the package "pacman" is installed if (!require("pacman")) { install.packages("pacman") } # Install and load required packages for this template pacman::p_load( knitr, # create output docs here, # find your files rio, # for importing data janitor, # clean/shape data dplyr, # clean/shape data tidyr, # clean/shape data forcats, # manipulate and rearrange factors stringr, # manipulate texts ggplot2, # create plots and charts apyramid, # plotting age pyramids sitrep, # MSF field epi functions anthro, # WHO Child Growth Standards (wrapper of survey) survey, # for survey functions srvyr, # dplyr wrapper for survey package gtsummary, # produce tables flextable, # for styling output tables labelled, # add labels to variables matchmaker, # recode datasets with dictionaries parsedate # guessing dates ) ## set default text size to 16 for plots ## give classic black/white axes for plots ggplot2::theme_set(theme_classic(base_size = 18))
## generates MSF standard dictionary for Kobo study_data_dict <- msf_dict_survey("Nutrition") ## generates MSF standard dictionary for Kobo in long format (for recoding) study_data_dict_long <- msf_dict_survey(disease = "Nutrition", compact = FALSE) ## generates a fake dataset for use as an example in this template ## this dataset already has household and individual levels merged study_data_raw <- gen_data(dictionary = "Nutrition", varnames = "name", numcases = 1000)
### Read in data --------------------------------------------------------------- ## Excel file ------------------------------------------------------------------ ## read in household data sheet # study_data_hh <- rio::import(here::here("nutrition_survey.xlsx"), # which = "nutrition_survey_erb", na = "") ## read in individual level data sheet # study_data_indiv <- rio::import(here::here("nutrition_survey.xlsx", # which = "r1", na = "") ## Excel file with password ---------------------------------------------------- ## Use this section if your Excel has a password. # install.packages(c("excel.link", "askpass")) # library(excel.link) # study_data_hh <- xl.read.file(here::here("nutrition_survey.xlsx"), # xl.sheet = "nutrition_survey_erb", # password = askpass::askpass(prompt = "please enter file password")) # study_data_indiv <- xl.read.file(here::here("nutrition_survey.xlsx"), # xl.sheet = "r1", # password = askpass::askpass(prompt = "please enter file password"))
## Excel file ------------------------------------------------------------------ ## to read in a specific sheet use "which" # study_data_hh <- rio::import(here::here("nutrition_survey.xlsx"), which = "Sheet1") ## Excel file -- Specific range ------------------------------------------------ ## you can specify a range in an excel sheet. # study_data_hh <- rio::import(here::here("nutrition_survey.xlsx"), range = "B2:J102") ## Excel file with password ---------------------------------------------------- ## use this section if your Excel has a password. # install.packages(c("excel.link", "askpass")) # library(excel.link) # study_data_hh <- xl.read.file(here::here("nutrition_survey.xlsx"), # xl.sheet = "Sheet1", # password = askpass::askpass(prompt = "please enter file password")) ## CSV file -------------------------------------------------------------------- # study_data_hh <- rio::import(here::here("nutrition_survey.csv")) ## Stata data file ------------------------------------------------------------- # study_data_hh <- rio::import(here::here("nutrition_survey.dat"))
## join the individual and household data to form a complete data set #study_data_raw <- left_join(study_data_hh, study_data_indiv, by = c("_index" = "_parent_index"))
## Data dictionary ------------------------------------------------------------- ## read in a kobo data dictionary ## important to specify template is FALSE (otherwise you get the generic dict) # study_data_dict <- msf_dict_survey(name = here::here("nutrition_survey_dict.xlsx"), # template = FALSE) ## look at the dictionary by uncommenting the line below # View(study_data_dict) ## Clean column names ---------------------------------------------------------- ## This step fixes the column names so they are easy to use in R. ## make a copy of your orginal dataset and name it study_data_cleaned study_data_cleaned <- study_data_raw ## Sometimes you want to rename specific variables ## The formula for this is rename(data, NEW_NAME = OLD_NAME). ## You can add multiple column name recodes by simply separating them with a comma # study_data_cleaned <- rename(study_data_cleaned, # violence_nature_other_details = violence_nature_other) ## define clean variable names using clean_names from the janitor package. study_data_cleaned <- janitor::clean_names(study_data_cleaned) ## create a unique identifier by combining indeces of the two levels ## x and y are added on to the end of variables that are duplicated ## (in this case, parent and child index) # study_data_cleaned <- study_data_cleaned %>% # mutate(uid = str_glue("{index}_{index_y}"))
## MSF Survey Dictionary ---------------------------------------------------------- ## generates MSF standard dictionary for Kobo # study_data_dict <- msf_dict_survey("Nutrition") ## generates MSF standard dictionary for Kobo in long format (for recoding) # study_data_dict_long <- msf_dict_survey(disease = "Nutrition", compact = FALSE) ## look at the standard dictionary by uncommenting the line below # View(study_data_dict) ## You will need to recode your variables to match the data dictionary. This is ## addressed below. ## Clean column names ---------------------------------------------------------- ## This step fixes the column names so they are easy to use in R. ## make a copy of your orginal dataset and name it study_data_cleaned # study_data_cleaned <- study_data_raw ## define clean variable names using clean_names from the janitor package. # study_data_cleaned <- janitor::clean_names(study_data_cleaned) ## Match column names --------------------------------------------------------- ## This step helps you match your variables to the standard variables. ## This step will require some patience. Courage! ## Use the function msf_dict_rename_helper() to create a template based on the ## standard dictionary. This will copy a rename command like the one above to your ## clipboard. # msf_dict_rename_helper("nutrition", varnames = "name") ## Paste the result below and your column names to the matching variable. ## Be careful! You still need to be aware of what each variable means and what ## values it takes. ## If there are any variables that are in the MSF dictionary that are not in ## your data set, then you should comment them out, but be aware that some ## analyses may not run because of this. ## PASTE HERE ## Here is an EXAMPLE for changing a few specific names. function. In this ## example, we have the columns "gender" and "age" that we want to rename as ## "sex" and "age_years". ## The formula for this is rename(data, NEW_NAME = OLD_NAME). # study_data_cleaned <- rename(study_data_cleaned, # sex = gender, # TEXT # age_years = age # INTEGER_POSITIVE # )
## Read data ------------------------------------------------------------------- ## This step reads in your population data from Excel. ## You may need to rename your columns. # population_data <- rio::import(here::here("population.xlsx"), which = "Sheet1") ## repeat preparation steps as appropriate ## Enter counts directly ------------------------------------------------------- ## Below is an example of how to enter population counts by groups. # population_data_age <- gen_population( # groups = c("0-2", "3-14", "15-29", "30-44", "45+"), # counts = c(3600, 18110, 13600, 8080, 6600), # strata = NULL) %>% # rename(age_group = groups, # population = n) ## Create counts from proportions ---------------------------------------------- ## This step helps you estimate sub-group size with proportions. ## You need to replace the total_pop and proportions. You can change the groups ## to fit your needs. ## Here we repeat the steps for two regions (district A and B) then bind the two ## together ## generate population data by age groups in years for district A population_data_age_district_a <- gen_population(total_pop = 10000, # set total population groups = c("6-11", "12-23", "24-35", "36-47", "48-60"), # set groups proportions = c(0.0164, 0.0164, 0.015, 0.015, 0.015), # set proportions for each group strata = c("Male", "Female")) %>% # stratify by gender rename(age_group = groups, # rename columns (NEW NAME = OLD NAME) sex = strata, population = n) %>% mutate(health_district = "district_a") # add a column to identify region ## generate population data by age groups in years for district B population_data_age_district_b <- gen_population(total_pop = 10000, # set total population groups = c("6-11", "12-23", "24-35", "36-47", "48-60"), # set groups proportions = c(0.0164, 0.0164, 0.015, 0.015, 0.015), # set proportions for each group strata = c("Male", "Female")) %>% # stratify by gender rename(age_group = groups, # rename columns (NEW NAME = OLD NAME) sex = strata, population = n) %>% mutate(health_district = "district_b") # add a column to identify region ## bind region population data together to get overall population population_data_age <- bind_rows(population_data_age_district_a, population_data_age_district_b)
cluster_counts <- tibble(cluster = c("Village 1", "Village 2", "Village 3", "Village 4", "Village 5", "Village 6", "Village 7", "Village 8", "Village 9", "Village 10"), households = c(700, 400, 600, 500, 300, 800, 700, 400, 500, 500))
## view the first ten rows of data head(study_data_raw, n = 10) ## view your whole dataset interactivley (in an excel style format) View(study_data_raw) ## overview of variable types and contents str(study_data_raw) ## gives mean, median and max values of variables ## gives counts for categorical variables ## also gives number of NAs summary(study_data_raw) ## view unique values contained in variables ## you can run this for any column -- just replace the column name unique(study_data_raw$sex) ## check for logical inconsistencies ## for example check age <6 months and height >100 cm and return corresponding IDs study_data_raw %>% filter(age_month < 6 & height > 100) %>% select("uid") ## use the dfSummary function in combination with view ## note that view is not capitalised with this package # summarytools::dfSummary(study_data_cleaned) %>% # summarytools::view()
## Kobo standard data -------------------------------------------------------- ## If you got your data from Kobo, use this portion of the code. ## If not, comment this section out and use the below. ## make sure all date variables are formatted as dates ## get the names of variables which are dates DATEVARS <- study_data_dict %>% filter(type == "date") %>% filter(name %in% names(study_data_cleaned)) %>% ## filter to match the column names of your data pull(name) # select date vars ## find if there are date variables which are completely empty EMPTY_DATEVARS <- purrr::map(DATEVARS, ~all( is.na(study_data_cleaned[[.x]]) )) %>% unlist() %>% which() ## remove exclude the names of variables which are completely emptys DATEVARS <- DATEVARS[-EMPTY_DATEVARS] ## change to dates ## use the parse_date() function to make a first pass at date variables. ## parse_date() produces a complicated format - so we use as.Date() to simplify study_data_cleaned <- study_data_cleaned %>% mutate( across(.cols = all_of(DATEVARS), .fns = ~parsedate::parse_date(.x) %>% as.Date())) ## Non-Kobo data ------------------------------------------------------------- ## Use this section if you did not have Kobo data. ## use the parse_date() function to make a first pass at date variables. ## parse_date() produces a complicated format - so we use as.Date() to simplify # study_data_cleaned <- study_data_cleaned %>% # mutate( # across(.cols = matches("date|Date"), # .fns = ~parsedate::parse_date(.x) %>% as.Date())) ## Fix wrong dates ------------------------------------------------------------- ## Some dates will be unrealistic or wrong. ## Here is an example of how to manually fix dates. ## Look at your data and edit as needed. ## set specific unrealistic dates to NA # study_data_cleaned <- mutate(study_data_cleaned, # date < as.Date("2017-11-01") ~ as.Date(NA), # date == as.Date("2081-01-01") ~ as.Date("2018-01-01"))
## Age group variables ---------------------------------------------------------- ## This step shows you how to create categorical variables from numeric variables. ## We have some intermediate steps on the way. ## make sure age is an integer study_data_cleaned <- study_data_cleaned %>% mutate(age_months = as.integer(age_months), age_years = as.integer(age_months)) ## add convert years to months (for those over 1 year) study_data_cleaned <- study_data_cleaned %>% mutate(age_months = if_else( age_years >= 1, as.integer(age_years * 12), as.integer(age_months) )) ## create an age group variable (combining those over 24 months and those under) study_data_cleaned <- study_data_cleaned %>% mutate(age_group_bin = factor( if_else(age_months >= 24, "24-60", "0-23" ) )) ## create age group variable for based on months study_data_cleaned <- study_data_cleaned %>% mutate(age_group = age_categories(study_data_cleaned$age_months, breakers = c(6, 12, 24, 36, 48, 60), ceiling = TRUE)) ## alternatively, create an age group variable specify a sequence # study_data_cleaned$age_group <- age_categories(study_data_cleaned$age, # lower = 0, # upper = 100, # by = 10) ## If you already have an age group variable defined, you should manually ## arrange the categories # study_data_cleaned$age_group <- factor(study_data_cleaned$age_group, # c("0-4y", "5-14y", "15-29y", "30-44y", "45+y")) ## to combine different age categories use the following function ## this prioritises the smaller unit, i.e. if given months and years, will return months first # study_data_cleaned <- group_age_categories(study_data_cleaned, # years = age_group, # months = age_group_mon)
## recode values with matchmaker package (value labels cant be used for analysis) study_data_cleaned <- match_df(study_data_cleaned, study_data_dict_long, from = "option_name", to = "option_label_english", by = "name", order = "option_order_in_set")
## Change a yes/no variable in to TRUE/FALSE ## create a new variable called consent ## where the old one is yes place TRUE in the new one ## (this could also be done with the if_else or case_when function - ## but this option is shorter) study_data_cleaned <- study_data_cleaned %>% mutate(consent = consent == "Yes") ## Change a character to factor - set the levels of a factor study_data_cleaned <- study_data_cleaned %>% mutate(sex = factor(sex, levels = c("Female", "Male")) ) ## recode the levels of a factor ## put these in a second variable called sex_brief ## (this is necessary because the anthro package doesnt accept "Male"/"female") study_data_cleaned <- study_data_cleaned %>% mutate(sex_brief = fct_recode(sex, "F" = "Female", "M" = "Male") ) # ## change the order of levels # study_data_cleaned <- study_data_cleaned %>% # mutate(soap = fct_relevel(soap, # "Distribution", # "Healthcentre", # "Never") # ) # ## create a new grouping for soap variable # ## simplified, distribution or health centre then covered otherwise not covered # study_data_cleaned <- study_data_cleaned %>% # mutate(coverage = if_else(soap %in% c("Distribution", "Healthcentre"), # "Covered", # "Not covered") # ) ## Recode character variables ## This step shows how to fix misspellings in the geographic region variable. ## Ideally, you want these values to match and population data! # study_data_cleaned <- study_data_cleaned %>% # mutate(village_name = case_when( # village_name == "Valliages 1" ~ "village 1", # village_name == "Village1" ~ "village 1", # village_name == "Town 3" ~ "village 3" # village_name == "Town3" ~ "village 3", # TRUE ~ as.character(village_name)) # )) ## create a character variable based off groups of a different variable study_data_cleaned <- study_data_cleaned %>% mutate(health_district = case_when( cluster_number %in% c(1:5) ~ "district_a", TRUE ~ "district_b" )) ## explicitly replace NA of a factor # study_data_cleaned <- study_data_cleaned %>% # mutate(coverage = fct_explicit_na(coverage, na_level = "Not Applicable")) ## if you had a multi-choice question and the missing values were given as a "." ## replace missing values in multi-choice questions to "" so that we can ## filter them out later. # study_data_cleaned <- study_data_cleaned %>% # mutate( # across( # .cols = contains("violence"), # all variables that contain the word "violence" # .fns = ~fct_explicit_na(as.character(.), "") # replace missing values with "" # ) # ) ## fix factor levels ----------------------------------------------------------- ## make sure there are the appropriate levels (names) study_data_cleaned$no_consent_reason <- fct_recode(study_data_cleaned$no_consent_reason, "No time" = "I do not have time", "Previous negative experience" = "I have had a negative experience with a previous survey.", "No perceived benefit" = "There is no benefit to me and my family" )
## anthro zscores -------------------------------------------------------------- ## retrieve zscores zscore_results <- with(study_data_cleaned, anthro_zscores( sex = as.character(sex_brief), age = age_months, is_age_in_month = TRUE, weight = weight, lenhei = height, oedema = as.numeric(oedema), armc = muac / 10 # convert to cm )) ## add columns from zscore_results columns to study dataset study_data_cleaned <- bind_cols(study_data_cleaned, zscore_results) ## categorise children --------------------------------------------------------- ## weight for height z-scores ## * Global acute malnutrition (GAM): a WHZ score of less than (<) -2 and/or oedema; ## * Moderate acute malnutrition: WHZ score <-2 and ≥ -3 and no oedema; ## * Severe acute malnutrition (SAM): WHZ score <-3 and/or oedema. study_data_cleaned <- study_data_cleaned %>% mutate( gam_whz = zwfl < -2 | oedema == "Yes", mam_whz = zwfl >= -3 & zwei < -2, sam_whz = zwfl < -3 | oedema == "Yes" ) %>% ## set flagged children to NA for each of the indicator variables ## if the flag for this indicator is not missing and is 1 then set indicator vars to NA ## else leave the indicator variable as it was mutate( across( .cols = c(gam_whz, mam_whz, sam_whz), # select the indicator variables .fns = ~if_else(!is.na(fwfl) & fwfl == 1, NA, .)) # NA if flagged, else leave as is ) ## stunting according to height for age z-scores ## * Stunting: HAZ score <-2; ## * Moderate stunting: HAZ score >=-3 and <-2; ## * Severe stunting: HAZ score <-3. study_data_cleaned <- study_data_cleaned %>% mutate( stunting_haz = zlen < -2, moderate_stunting_haz = zlen >= -3 & zlen < -2, severe_stunting_haz = zlen < -3 ) %>% ## set flagged children to NA for each of the indicator variables ## if the flag for this indicator is not missing and is 1 then set indicator vars to NA ## else leave the indicator variable as it was mutate( across( .cols = c(stunting_haz, moderate_stunting_haz, severe_stunting_haz), # select the indicator variables .fns = ~if_else(!is.na(flen) & flen == 1, NA, .))) # NA if flagged, else leave as is ## underweight according to height for age z-scores ## * Underweight: WAZ score < -2 OR oedema; ## * moderate underweight: WAZ score >=-3 and <-2; ## * Severe underweight: WAZ < -3 OR oedema study_data_cleaned <- study_data_cleaned %>% mutate( underweight_waz = zwei < -2 | oedema == "Yes", moderate_underweight_waz = zwei >= -3 & zwei < -2, severe_underweight_waz = zwei < -3 | oedema == "Yes" ) %>% ## set flagged children to NA for each of the indicator variables ## if the flag for this indicator is not missing and is 1 then set indicator vars to NA ## else leave the indicator variable as it was mutate( across( .cols = c(underweight_waz, moderate_underweight_waz, severe_underweight_waz), # select the indicator variables .fns = ~if_else(!is.na(fwei) & fwei == 1, NA, .))) # NA if flagged, else leave as is ## according to MUAC ## * Global acute malnutrition (GAM): MUAC of <125mm and/or oedema; ## * Moderate acute malnutrirtion: MUAC <125mm and >= 115mm and no oedema; ## * Severe acute malnutrition (SAM): MUAC <115mm and/or oedema. study_data_cleaned <- study_data_cleaned %>% mutate( gam_muac = muac < 125 | oedema == "Yes", mam_muac = muac < 125 & muac >= 115 & oedema == "No", sam_muac = muac < 115 | oedema == "Yes" ) ## group indicator variables --------------------------------------------------- ## define all the indicators of interest ## we will use this to run the same function over all variables later on WHZ <- c("gam_whz", "mam_whz", "sam_whz") HAZ <- c("stunting_haz", "moderate_stunting_haz", "severe_stunting_haz") WAZ <- c("underweight_waz", "moderate_underweight_waz", "severe_underweight_waz") MUAC <- c("gam_muac", "mam_muac", "sam_muac") indicators <- c(WHZ, HAZ, WAZ, MUAC) ## turn all indicators in to factors study_data_cleaned <- study_data_cleaned %>% mutate( across( .cols = all_of(indicators), # all variables named in indicators .fns = factor # turn into a factor )) ## turn all flags in to TRUE/FALSE variables study_data_cleaned <- study_data_cleaned %>% mutate( across( .cols = all_of(c("fwei", "flen", "fwfl")), # names of flag variables .fns = as.logical # turn in to logical ))
## create variable names with labelled ## define a list of names and labels based on the dictionary var_labels <- setNames( ## add variable labels as a list as.list(study_data_dict$short_name), ## name the list with the current variable names study_data_dict$name) ## add the variable labels to dataset study_data_cleaned <- study_data_cleaned %>% set_variable_labels( ## set the labels with the object defined above .labels = var_labels, ## do not throw an error if some names dont match ## not all names in the dictionary appear in the dataset .strict = FALSE) ## it is possible to update individual variables manually too study_data_cleaned <- study_data_cleaned %>% set_variable_labels( ## variable name = variable label age_years = "Age (years)", age_months = "Age (months)", age_group = "Age group (months)", age_group_bin = "Age category (months)", flen = "HAZ", fwei = "WAZ", fwfl = "WHZ", gam_muac = "GAM", mam_muac = "MAM" , sam_muac = "SAM", moderate_stunting_haz = "Moderate stunting", severe_stunting_haz = "Severe stunting", stunting_haz = "Stunting", moderate_underweight_waz = "Moderate underweight", severe_underweight_waz = "Severe underweight", underweight_waz = "Underweight", gam_whz = "GAM", mam_whz = "MAM", sam_whz = "SAM" )
## Drop unused rows ----------------------------------------------------------- ## store the cases that you drop so you can describe them (e.g. non-consenting) dropped <- study_data_cleaned %>% filter(!consent | age_months < 6 | age_months > 60 | ## note that whatever you use for weights cannot be missing! village_name == "Other"| is.na(age_years) | is.na(sex)) ## drop the unused rows from the survey data set study_data_cleaned <- anti_join(study_data_cleaned, dropped, by = names(dropped)) ## Drop columns ---------------------------------------------------------------- ## OPTIONAL: This step shows you how you can remove certain variables. ## study_data_cleaned <- select(study_data_cleaned, -c("age_years", "sex")) ## OPTIONAL: if you want to inspect certain variables, you can select these by ## name or column number. This example creates a reduced dataset for the first ## three columns, age_years, and sex. # study_data_reduced <- select(study_data_cleaned, c(1:3, "age_years", "sex")
## option 1: only keep the first occurrence of duplicate case study_data_cleaned <- study_data_cleaned %>% ## find duplicates based on case number, sex and age group ## only keep the first occurrence distinct(uid, sex, age_group, .keep_all = TRUE) # ## option 2: create flagging variables for duplicates (then use to browse) # # study_data_cleaned <- study_data_cleaned %>% # ## choose which variables to use for finding unique rows # group_by(uid, sex, age_group) %>% # mutate( # ## get the number of times duplicate occurs # num_dupes = n(), # duped = if_else(num_dupes > 1 , TRUE, FALSE) # ) # # ## browse duplicates based on flagging variables # study_data_cleaned %>% # ## only keep rows that are duplicated # filter(duped) %>% # ## arrange by variables of interest # arrange(uid, sex, age_group) %>% # View() # # ## filter duplicates to only keep the row with the earlier entry # study_data_cleaned %>% # ## choose which variables to use for finding unique rows # group_by(uid, sex, age_group) %>% # ## sort to have the earliest date by person first # arrange(as.Date(start)) %>% # ## only keep the earliest row # slice(1)
## stratified ------------------------------------------------------------------ ## create a variable called "surv_weight_strata" ## contains weights for each individual - by age group, sex and health district study_data_cleaned <- add_weights_strata(x = study_data_cleaned, p = population_data_age, surv_weight = "surv_weight_strata", surv_weight_ID = "surv_weight_ID_strata", age_group, sex, health_district) ## cluster --------------------------------------------------------------------- ## get the number of individuals interviewed per household ## adds a variable with counts of the household (parent) index variable study_data_cleaned <- study_data_cleaned %>% add_count(index, name = "interviewed") ## create cluster weights study_data_cleaned <- add_weights_cluster(x = study_data_cleaned, cl = cluster_counts, eligible = number_children, interviewed = interviewed, cluster_x = village_name, cluster_cl = cluster, household_x = index, household_cl = households, surv_weight = "surv_weight_cluster", surv_weight_ID = "surv_weight_ID_cluster", ignore_cluster = FALSE, ignore_household = FALSE) ## stratified and cluster ------------------------------------------------------ ## create a survey weight for cluster and strata study_data_cleaned <- study_data_cleaned %>% mutate(surv_weight_cluster_strata = surv_weight_strata * surv_weight_cluster)
## simple random --------------------------------------------------------------- survey_design_simple <- study_data_cleaned %>% as_survey_design(ids = 1, # 1 for no cluster ids weights = NULL, # No weight added strata = NULL # sampling was simple (no strata) ) ## stratified ------------------------------------------------------------------ survey_design_strata <- study_data_cleaned %>% as_survey_design(ids = 1, # 1 for no cluster ids weights = surv_weight_strata, # weight variable created above strata = health_district # sampling was stratified by district ) ## cluster --------------------------------------------------------------------- survey_design_cluster <- study_data_cleaned %>% as_survey_design(ids = village_name, # cluster ids weights = surv_weight_cluster, # weight variable created above strata = NULL # sampling was simple (no strata) ) ## stratified cluster ---------------------------------------------------------- survey_design <- study_data_cleaned %>% as_survey_design(ids = village_name, # cluster ids weights = surv_weight_cluster_strata, # weight variable created above strata = health_district # sampling was stratified by district )
rio::export(study_data_cleaned, here::here("data", str_glue("study_data_cleaned_{Sys.Date()}.xlsx")))
## get counts of number of clusters num_clus <- study_data_cleaned %>% ## trim data to unique clusters distinct(cluster_number) %>% ## get number of rows (count how many unique) nrow() ## get counts of number households num_hh <- study_data_cleaned %>% ## get unique houses by cluster distinct(cluster_number, household_number) %>% ## get number of rounds (count how many unique) nrow()
We included r num_hh
households across r num_clus
clusters in this survey analysis.
Among the r nrow(dropped)
individuals excluded from the survey analysis,
r fmt_count(dropped, consent)
individuals were excluded for not being between
6 and 60 months and r fmt_count(dropped, !consent)
were excluded
for lack of consent. The reasons for no consent are shown below.
## using the dataset with dropped individuals dropped %>% ## only keep reasons for no consent select(no_consent_reason) %>% ## make NAs a factor level called missing ## to make the proportion of the total, including the missings mutate(no_consent_reason = fct_explicit_na(no_consent_reason, na_level = "Missing")) %>% ## get counts and proportions tbl_summary() %>% ## make variable names bold bold_labels() %>% # change to flextable format as_flex_table() %>% # make header text bold (using {flextable}) bold(part = "header") %>% # make your table fit to the maximum width of the word document set_table_properties(layout = "autofit")
## get counts of the number of households per cluster clustersize <- study_data_cleaned %>% ## trim data to only unique households within each cluster distinct(cluster_number, household_number) %>% ## count the number of households within each cluster count(cluster_number) %>% pull(n) ## get the median number of households per cluster clustermed <- median(clustersize) ## get the min and max number of households per cluster ## paste these together seperated by a dash clusterrange <- str_c(range(clustersize), collapse = "--") ## get counts of children per household ## do this by cluster as household IDs are only unique within clusters hhsize <- study_data_cleaned %>% count(cluster_number, household_number) %>% pull(n) ## get median number of children per household hhmed <- median(hhsize) ## get the min and max number of children per household ## paste these together seperated by a dash hhrange <- str_c(range(hhsize), collapse = "--") ## get standard deviation hhsd <- round(sd(hhsize), digits = 1)
The median number of households per cluster was
r clustermed
, with a range of r clusterrange
. The median number of children
per household was r hhmed
(range: r hhrange
, standard deviation: r hhsd
).
In total we included r nrow(study_data_cleaned)
in the survey analysis.
The age break down and a comparison with the source population is shown below.
## counts and props of the study population ag <- tabyl(study_data_cleaned, age_group, show_na = FALSE) %>% mutate(n_total = sum(n), age_group = fct_inorder(age_group)) ## counts and props of the source population propcount <- population_data_age %>% group_by(age_group) %>% tally(population) %>% mutate(percent = n / sum(n)) ## bind together the columns of two tables, group by age, and perform a ## binomial test to see if n/total is significantly different from population ## proportion. ## suffix here adds to text to the end of columns in each of the two datasets left_join(ag, propcount, by = "age_group", suffix = c("", "_pop")) %>% group_by(age_group) %>% ## broom::tidy(binom.test()) makes a data frame out of the binomial test and ## will add the variables p.value, parameter, conf.low, conf.high, method, and ## alternative. We will only use p.value here. You can include other ## columns if you want to report confidence intervals mutate(binom = list(broom::tidy(binom.test(n, n_total, percent_pop)))) %>% unnest(cols = c(binom)) %>% # important for expanding the binom.test data frame mutate( across(.cols = contains("percent"), .fns = ~.x * 100)) %>% ## Adjusting the p-values to correct for false positives ## (because testing multiple age groups). This will only make ## a difference if you have many age categories mutate(p.value = p.adjust(p.value, method = "holm")) %>% ## Only show p-values over 0.001 (those under report as <0.001) mutate(p.value = ifelse(p.value < 0.001, "<0.001", as.character(round(p.value, 3)))) %>% ## rename the columns appropriately select( "Age group" = age_group, "Study population (n)" = n, "Study population (%)" = percent, "Source population (n)" = n_pop, "Source population (%)" = percent_pop, "P-value" = p.value ) %>% # produce styled output table with auto-adjusted column widths with {flextable} qflextable() %>% # make header text bold (using {flextable}) bold(part = "header") %>% # make your table fit to the maximum width of the word document set_table_properties(layout = "autofit") %>% ## set to only show 1 decimal place colformat_double(digits = 1)
## compute the median age medage <- median(study_data_cleaned$age_months) ## paste the lower and uper quartile together iqr <- str_c( # basically copy paste together the following ## calculate the 25% and 75% of distribution, with missings removed quantile( study_data_cleaned$age_months, c(0.25, 0.75), na.rm = TRUE), ## between lower and upper place an en-dash collapse = "--") ## compute overall sex ratio sex_ratio <- study_data_cleaned %>% count(sex) %>% pivot_wider(names_from = sex, values_from = n) %>% mutate(ratio = round(Male/Female, digits = 3)) %>% pull(ratio) ## compute sex ratios by age group sex_ratio_age <- study_data_cleaned %>% count(age_group, sex) %>% pivot_wider(names_from = sex, values_from = n) %>% mutate(ratio = round(Male/Female, digits = 3)) %>% select(age_group, ratio) ## sort table by ascending ratio then select the lowest (first) min_sex_ratio_age <- arrange(sex_ratio_age, ratio) %>% slice(1)
Among the r nrow(study_data_cleaned)
surveyed individuals, there were
r fmt_count(study_data_cleaned, sex == "Female")
females and
r fmt_count(study_data_cleaned, sex == "Male")
males (unweighted). The male to
female ratio was r sex_ratio
in the surveyed population. The lowest male to
female ratio was r min_sex_ratio_age$ratio
in the r min_sex_ratio_age$age_group
month age group.
The median age of surveyed individuals was r medage
years (Q1-Q3 of r iqr
years). Children under two years of age made up
r fmt_count(study_data_cleaned, age_group_bin == "0-23")
of the surveyed
individuals.
The highest number of surveyed individuals (unweighted) were in the
r table(study_data_cleaned$age_group) %>% which.max() %>% names()
year age group.
Unweighted age distribution of household population by year age group and gender.
# get cross tabulated counts and proportions study_data_cleaned %>% tbl_cross( row = age_group, col = sex, percent = "cell") %>% # change to flextable format as_flex_table() %>% # make header text bold (using {flextable}) bold(part = "header") %>% # make your table fit to the maximum width of the word document set_table_properties(layout = "autofit")
There were r fmt_count(dropped, is.na(sex))
cases missing
information on sex and
r fmt_count(dropped, is.na(age_group))
missing age group.
Unweighted age and gender distribution of household population covered by the survey.
age_pyramid(study_data_cleaned, age_group, split_by = sex, proportional = TRUE) + labs(y = "Proportion", x = "Age group (months)") + # change axis labels theme(legend.position = "bottom", # move legend to bottom legend.title = element_blank(), # remove title text = element_text(size = 18) # change text size )
Unweighted age and gender distribution, stratified by health district, of household population covered by the survey. <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// age_pyramid_strata \\
This chunk creates an unweighted (using study_data_cleaned) age/sex pyramid of your cases, stratified by health district.
If you have a stratified survey design this may be useful for visualising if you have an excess of representation in either sex or gender in any of your survey strata. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
age_pyramid(study_data_cleaned, age_group, split_by = sex, stack_by = health_district, proportional = TRUE, pal = c("red", "blue")) + labs(y = "Proportion", x = "Age group (months)") + # change axis labels theme(legend.position = "bottom", # move legend to bottom legend.title = element_blank(), # remove title text = element_text(size = 18) # change text size )
Weighted age and gender distribution of household population covered by the survey.
age_pyramid(survey_design, age_group, split_by = sex, proportion = TRUE) + labs(y = "Proportion (weighted)", x = "Age group (months)") + # change axis labels theme(legend.position = "bottom", # move legend to bottom legend.title = element_blank(), # remove title text = element_text(size = 18) # change text size )
Box plot of Zscores for height-for-age, weight-for-age and weight-for-height
## pull variables together for plotting temp_data <- study_data_cleaned %>% ## stack variable names in one column and values for vars in another ## call first column indicator and second column zscore pivot_longer(cols = c(zwfl, zlen, zwei), names_to = "Indicator", values_to = "Zscore") %>% ## only keep variables of interest select(Indicator, Zscore) %>% ## rename indicators to something more understandable mutate( Indicator = fct_recode(Indicator, "HAZ" = "zlen", "WAZ" = "zwei", "WHZ" = "zwfl" )) ## use indicators on the x-axis and z-score on the y-axis ggplot(temp_data, aes(x = Indicator, y = Zscore)) + ## plot as a box plot geom_boxplot() + ## add a dotted horizontal line at 0 for reference geom_hline(yintercept = 0, linetype = "dashed")
Box plot of Zscores for height-for-age, weight-for-age and weight-for-height by sex
## pull variables together for plotting temp_data <- study_data_cleaned %>% ## stack variable names in one column and values for vars in another ## call first column indicator and second column zscore pivot_longer(cols = c(zwfl, zlen, zwei), names_to = "Indicator", values_to = "Zscore") %>% ## only keep variables of interest select(sex, Indicator, Zscore) %>% ## rename indicators to something more understandable mutate( Indicator = fct_recode(Indicator, "HAZ" = "zlen", "WAZ" = "zwei", "WHZ" = "zwfl" )) ## use indicators on the x-axis and z-score on the y-axis ## colour according to sex ggplot(temp_data, aes(x = Indicator, y = Zscore, fill = sex)) + ## plot as a boxplot geom_boxplot() + ## add a dotted horizontal line at 0 for reference geom_hline(yintercept = 0, linetype = "dashed") + ## remove the title of legend theme(legend.title = element_blank())
Box plot of Zscores for height-for-age by age group
## use age group on the x-axis and height z-score on the y-axis ggplot(study_data_cleaned, aes(x = age_group, y = zlen)) + ## plot as a boxplot geom_boxplot() + ## add a dotted horizontal line at 0 for reference geom_hline(yintercept = 0, linetype = "dashed") + ## change axis titles labs(x = "Age group (Months)", y = "HAZ score")
Box plot of Zscores for weight-for-age by age group
## use age group on the x-axis and weight z-score on the y-axis ggplot(study_data_cleaned, aes(x = age_group, y = zwei)) + ## plot as a boxplot geom_boxplot() + ## add a dotted horizontal line at 0 for reference geom_hline(yintercept = 0, linetype = "dashed") + ## change axis titles labs(x = "Age group (Months)", y = "WAZ score")
Box plot of Zscores for weight-for-height by age group
## use age group on the x-axis and weight-height z-score on the y-axis ggplot(study_data_cleaned, aes(x = age_group, y = zwfl)) + ## plot as a boxplot geom_boxplot() + ## add a dotted horizontal line at 0 for reference geom_hline(yintercept = 0, linetype = "dashed") + ## change axis titles labs(x = "Age group (Months)", y = "WHZ score")
Unweighted z-curve of height-for-age among non-flagged children for this indicator <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// zcurve_haz \\
This chunk creates a z-curve of height-for-age compared to the WHO reference population ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
study_data_cleaned %>% ## only consider z-scores that are not flagged filter(flen == FALSE) %>% ## plot zcurve zcurve(zlen)
Unweighted z-curve of weight-for-age among non-flagged children for this indicator <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// zcurve_haz \\
This chunk creates a z-curve of weight-for-age compared to the WHO reference population ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
study_data_cleaned %>% ## only consider z-scores that are not flagged filter(fwei == FALSE) %>% ## plot zcurve zcurve(zwei)
Unweighted z-curve of weight-for-height among non-flagged children for this indicator <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// zcurve_whz \\
This chunk creates a z-curve of weighted-for-height compared to the WHO reference population ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
study_data_cleaned %>% ## only consider z-scores that are not flagged filter(fwfl == FALSE) %>% ## plot zcurve zcurve(zwfl)
Unweighted counts and proportions of children missing or flagged for indicators <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// missing_flagged_indicators \\
This chunk creates an unweighted table of number children with missing or flagged indicators ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
study_data_cleaned %>% select(flen, fwei, fwfl) %>% ## mutate each of the flags be easier to enterprate mutate( across(everything(), ~case_when( .x == TRUE ~ "Flagged", .x == FALSE ~ "Included", TRUE ~ "Missing" )) ) %>% # Variable labels are removed by mutate(across(..)) for some reason set_variable_labels( ## variable name = variable label flen = "HAZ", fwei = "WAZ", fwfl = "WHZ" ) %>% tbl_summary() %>% # change to flextable format as_flex_table() %>% # make header text bold (using {flextable}) bold(part = "header") %>% # make your table fit to the maximum width of the word document set_table_properties(layout = "autofit")
In addition the number of children missing MUAC
was r fmt_count(study_data_cleaned, is.na(muac))
.
Weighted prevalence of malnutrition based on MUAC, by age group (months) and overall
overall <- survey_design %>% ## tabulate multiple variables with same values select(all_of(MUAC)) %>% ## only show the row with TRUE tbl_svysummary(missing = "no", value = everything() ~ TRUE, statistic = everything() ~ "{n} ({p}%) [{deff}]") %>% ## add confidence intervals add_ci() age_strat <- survey_design %>% ## tabulate multiple variables with same values select(all_of(MUAC), age_group) %>% ## only show the row with TRUE tbl_svysummary(missing = "no", value = everything() ~ TRUE, ## stratify by age group by = age_group) ## combine the overall and stratified tables tbl_merge(list(overall, age_strat)) %>% modify_spanning_header( list( ## rename the spanning header ## you can see what the columns are called by putting in an object and inspecting table_body c(stat_0_1, ci_stat_0_1) ~ "**Overall**", c(stat_1_2, stat_2_2, stat_3_2, stat_4_2, stat_5_2) ~ "**Age group (months)**")) %>% # change to flextable format as_flex_table() %>% # make header text bold (using {flextable}) bold(part = "header") %>% # make your table fit to the maximum width of the word document set_table_properties(layout = "autofit")
Weighted prevalence of malnutrition based on MUAC among children <87cm, by age group (months) and overall <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// muac_age_group_filter \\
This chunk creates a weighted table of prevalence based on MUAC for age groups and for the overall population among children under 87cm tall (using filter).
This happens in three stages: - Table for overall - Table for age groups - Bind the two together as rows ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
overall <- survey_design %>% ## only keep children less than 87cm filter(height < 87) %>% ## tabulate multiple variables with same values select(all_of(MUAC)) %>% ## only show the row with TRUE tbl_svysummary(missing = "no", value = everything() ~ TRUE) %>% ## add confidence intervals add_ci() age_strat <- survey_design %>% ## only keep children less than 87cm filter(height < 87) %>% ## tabulate multiple variables with same values select(all_of(MUAC), age_group) %>% ## only show the row with TRUE tbl_svysummary(missing = "no", value = everything() ~ TRUE, ## stratify by age group by = age_group) ## combine the overall and stratified tables tbl_merge(list(overall, age_strat)) %>% modify_spanning_header( list( ## rename the spanning header ## you can see what the columns are called by putting in an object and inspecting table_body c(stat_0_1, ci_stat_0_1) ~ "**Overall**", c(stat_1_2, stat_2_2, stat_3_2, stat_4_2, stat_5_2) ~ "**Age group (months)**")) %>% # # change to flextable format as_flex_table() %>% # make header text bold (using {flextable}) bold(part = "header") %>% # make your table fit to the maximum width of the word document set_table_properties(layout = "autofit")
Weighted prevalence of malnutrition based on height-for-age z-score categories, by age group (months) and overall
overall <- survey_design %>% ## tabulate multiple variables with same values select(all_of(HAZ)) %>% ## only show the row with TRUE tbl_svysummary(missing = "no", value = everything() ~ TRUE) %>% ## add confidence intervals add_ci() age_strat <- survey_design %>% ## tabulate multiple variables with same values select(all_of(HAZ), age_group) %>% ## only show the row with TRUE tbl_svysummary(missing = "no", value = everything() ~ TRUE, ## stratify by age group by = age_group) ## combine the overall and stratified tables tbl_merge(list(overall, age_strat)) %>% modify_spanning_header( list( ## rename the spanning header ## you can see what the columns are called by putting in an object and inspecting table_body c(stat_0_1, ci_stat_0_1) ~ "**Overall**", c(stat_1_2, stat_2_2, stat_3_2, stat_4_2, stat_5_2) ~ "**Age group (months)**")) %>% # change to flextable format as_flex_table() %>% # make header text bold (using {flextable}) bold(part = "header") %>% # make your table fit to the maximum width of the word document set_table_properties(layout = "autofit")
Weighted prevalence of malnutrition based on weight-for-age z-score categories, by age group (months) and overall <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// waz_age_group \\
This chunk creates a weighted table of prevalence based on weight-for-age z-scores for age groups and for the overall population.
This happens in three stages: - Table for overall - Table for age groups - Bind the two together as rows ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
overall <- survey_design %>% ## tabulate multiple variables with same values select(all_of(WAZ)) %>% ## only show the row with TRUE tbl_svysummary(missing = "no", value = everything() ~ TRUE) %>% ## add confidence intervals add_ci() age_strat <- survey_design %>% ## tabulate multiple variables with same values select(all_of(WAZ), age_group) %>% ## only show the row with TRUE tbl_svysummary(missing = "no", value = everything() ~ TRUE, ## stratify by age group by = age_group) ## combine the overall and stratified tables tbl_merge(list(overall, age_strat)) %>% modify_spanning_header( list( ## rename the spanning header ## you can see what the columns are called by putting in an object and inspecting table_body c(stat_0_1, ci_stat_0_1) ~ "**Overall**", c(stat_1_2, stat_2_2, stat_3_2, stat_4_2, stat_5_2) ~ "**Age group (months)**")) %>% # change to flextable format as_flex_table() %>% # make header text bold (using {flextable}) bold(part = "header") %>% # make your table fit to the maximum width of the word document set_table_properties(layout = "autofit")
Weighted prevalence of malnutrition based on weight-for-height z-score categories, by age group (months) and overall <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// whz_age_group \\
This chunk creates a weighted table of prevalence based on weight-for-height z-scores for age groups and for the overall population.
This happens in three stages: - Table for overall - Table for age groups - Bind the two together as rows ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
overall <- survey_design %>% ## tabulate multiple variables with same values select(all_of(WHZ)) %>% ## only show the row with TRUE tbl_svysummary(missing = "no", value = everything() ~ TRUE) %>% ## add confidence intervals add_ci() age_strat <- survey_design %>% ## tabulate multiple variables with same values select(all_of(WHZ), age_group) %>% ## only show the row with TRUE tbl_svysummary(missing = "no", value = everything() ~ TRUE, ## stratify by age group by = age_group) ## combine the overall and stratified tables tbl_merge(list(overall, age_strat)) %>% modify_spanning_header( list( ## rename the spanning header ## you can see what the columns are called by putting in an object and inspecting table_body c(stat_0_1, ci_stat_0_1) ~ "**Overall**", c(stat_1_2, stat_2_2, stat_3_2, stat_4_2, stat_5_2) ~ "**Age group (months)**")) %>% # change to flextable format as_flex_table() %>% # make header text bold (using {flextable}) bold(part = "header") %>% # make your table fit to the maximum width of the word document set_table_properties(layout = "autofit")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.