knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" ) options(width = 110) library(fetchdhs) library(tidyverse)
The Demographic and Health Surveys (DHS) Program has conducted more than 400 surveys in over 90 countries since 1984. It remains a critical resource in global health research and analytics as it provides nationally representative data on fertility, family planning, maternal and child health, gender, HIV/AIDS, malaria, and nutrition. The objective of this package is to enable R users to plug into the DHS API and retrieve tidy survey data.
# install.packages("devtools") devtools::install_github("murphy-xq/fetchdhs")
Let's say you quickly need DHS survey data for:
First, use fetch_countries()
to locate the 2-letter DHS country codes for India and Nigeria needed for the api call
fetch_countries() %>% filter(country_name %in% c("India", "Nigeria"))
Next, make use of DHS API tags that categorize survey indicators by topic. In this example, we are looking for all immunization-related indicators using fetch_tags()
and identify tag 32
fetch_tags() %>% filter(str_detect(tag_name, "[Ii]mmunization"))
Finally, use fetch_data()
to call the DHS API using the parameters just identified and receive a tidy dataframe as well as the api call:
fetch_data(countries = c("IA","NG"), tag = 32, years = 2000:2017)
For specific indicators, we can peek at a dataframe of all available indicators to identify which indicator_id
codes should be included with fetch_data()
. Let's try pulling only DPT3 and Measles indicators:
fetch_indicators() %>% filter(str_detect(definition, "Measles|DPT3"))
Upon investigating the DPT3 and Measles indicators and their associated attributes, we see that we need to use CH_VACC_C_DP3
and CH_VACC_C_MSL
:
fetch_data(countries = c("IA","NG"), indicators = c("CH_VACC_C_DP3", "CH_VACC_C_MSL"), years = 2000:2017)
We have been using the default level of disaggregation which returns national-level data only. In order to pull subnational, background characteristic, or all available data, we need to specify the breakdown_level
in fetch_data()
breakdown_level == "national"
(205 records) to breakdown_level == "all"
(3,270 records)# national (default) fetch_data(countries = c("IA","NG"), tag = 32, years = 2000:2017, breakdown_level = "national") # subnational fetch_data(countries = c("IA","NG"), tag = 32, years = 2000:2017, breakdown_level = "subnational") # background fetch_data(countries = c("IA","NG"), tag = 32, years = 2000:2017, breakdown_level = "background") # all fetch_data(countries = c("IA","NG"), tag = 32, years = 2000:2017, breakdown_level = "all")
Return fields are the various dimensions of survey data that can be returned. set_return_fields()
allows the user to specify which fields should comprise the dataframe returned from the api
set_return_fields(c("Indicator", "CountryName", "SurveyYear", "SurveyType", "Value")) fetch_data(countries = c("IA","NG"), tag = 32, years = 2000:2017)
We can also include polygon coordinates with our call with add_geometry
fetch_data(countries = c("IA","NG"), tag = 32, years = 2000:2017, add_geometry = TRUE)
Authenticated users can query more records per page -- 5,000 versus 1,000 maximum records per page. Please see here for authentication details.
Users can input their api key with set_api_key()
for inclusion in any subsequent fetch_data()
calls.
set_api_key("YOURKEY-GOESHERE")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.