knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%", message = FALSE, warning = FALSE ) devtools::load_all()
healthdatacsv allows users to query the healthdata.gov API catalog and return tidy data frames. This package focuses on the data.json endpoint and will download datasets if available via file download, namely, in csv format.
You can install the development version of healthdatacsv from GitHub with:
``` {r, eval = FALSE} install.packages("remotes") remotes::install_github("iecastro/healthdatacsv")
## Examples ```r library(healthdatacsv)
Basic examples which show you how to use healthdatacsv:
fetch_catalog(keyword = "alcohol|drugs")
fetch_catalog("Centers for Disease Control and Prevention", keyword = "built environment") %>% fetch_data()
library(dplyr) # for table manipulation verbs # query catalog fetch_catalog(keyword = "alcohol") # fetch data fetch_catalog(keyword = "alcohol") %>% dplyr::slice(1) %>% # dplyr fetch_data()
fetch_data()
wraps the vroom function, which helps quickly read relatively large delimited files:
# PRAMS data # CDC surveillance for reproductive health # query, filter, and fetch prams <- fetch_catalog(keyword = "reproductive health") %>% mutate(year = readr::parse_number(product)) %>% filter(year >= 2010) %>% arrange(year) %>% fetch_data() prams %>% select(product, data_tbl)
Say you're interested in searching for CDC datasets related to the built environment. You can use healthdatacsv to fetch the catalog of available data products:
cdc_built_env <- fetch_catalog("Centers for Disease Control and Prevention", keyword = "built environment") cdc_built_env
In this case, there is only one product available. To learn more about the dataset, you can simply pull
the description:
cdc_built_env %>% dplyr::pull(description)
This dataset relates to state legislation on nutrition, physical activity, and obesity during 2001-2017. Data only includes enacted legislation.
To download the data, we pass the catalog object to the fetch_data
function. Since there is only one dataset to download, we can unnest
in the same pipe. If the catalog has more than one product that you'd like to keep, it is recommended to unnest each product separately. If the catalog consists of several time points of the same dataset, you could unnest in one pipe, given all column names are the same.
data_raw <- cdc_built_env %>% fetch_data() %>% #> tidyr::unnest(data_tbl) data_raw
get_agencies()
will query and download names of agencies listed in the catalog. You can enter partial name or initials of agency of interest to check if it has data cataloged in the API. Argument relies on regular expression to detect string matches.
get_agencies("NIH|CDC|FDA") get_agencies("Institute|Drug")
The resulting dataframe will depend on matches to the string supplied. To pull all agencies cataloged, simply use get_agencies
with the default argument:
get_agencies()
get_keywords()
will extract all keywords cataloged. The function accepts as argument a full agency name (in proper title case) which can be obtained with get_agencies
.
get_keywords("National Institutes of Health (NIH)")
Conversely, this function can be run with no argument for a data frame of all keywords and agencies cataloged.
Development of this package partly supported by a research grant from the National Institute on Alcohol Abuse and Alcoholism - NIH Grant #R34AA026745. This product is not endorsed nor certified by either healthdata.gov or NIH/NIAAA.
Please note that the 'healthdatacsv' project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.