knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Welcome to the R client library for accessing the COVID County Data database.
Links:
As of right now, this library is a wrapper around the covidcountydata.py Python client. For more examples and documentation, please see that library. If you are an R programmer and are willing to contribute to making a native library, please reach out at our repository!
Also, please see the project website for more information.
Covid County Data (CCD) is a project funded by Schmidt Futures and seeks to simplify the data ingestion process for researchers and policy makers who are working to enact and understand COVID-19 related policies. We accomplish this goal in several ways:
More information about our project and what data is collected can be found on our website.
We are always looking to hear from both those who would like to help us build CCD and those who would like use CCD. Please reach out to us!
Please install this package using devtools::install_github
as follows
devtools::install_github("CovidCountyData/covidcountydataR")
After installing the package, you need to make sure that the underlying python package is installed.
To do this, use
covidcountydataR::install_ccdPY()
During the installation process, R will check if you have an existing Python installation that can be used
You may be prompted to accept the installation of a dedicated Python (via miniconda) for R to use
We recommend that you accept this request, but if you are comfortable managing your own Python installation you can say no
Once the package is installed, the first step is to create an API client:
library(covidcountydataR)
cl <- client()
You can see a list of currently available datasets using:
datasets(cl)
Each dataset has an associated function
You can get detailed information on a specific dataset using the info
method. For example
info(cl) info(cl, "demographics") info(cl, "covid_historical")
Requesting a dataset has three parts:
fetch
the datasetsTo create a client, use the client
function as shown above
cl <- client()
You can optionally pass in an API key if you have one (see section on API keys below)
cl <- client("my api key")
If you have previously registered for an API key (again, see below) on your current machine, it will be loaded and used automatically for you
In practice you should rarely need to pass the apikey by hand unless you are loading the key from an environment variable or another source
Each of the datasets in the API have an associated function
To add datasets to the current request, datasetName(client)
function:
covid_us(cl, state="CA") demographics(cl) cl
You can see that the printed form of the client is updated to show you what the current request looks like
To clear the current request, use reset(cl)
:
reset(cl) #> CCD Client
Each dataset function will build up a request for the client and will return the client itself
This allows us to use the pipe operator (%>%
) to do the above as:
cl %>% covid_us(state="CA") %>% demographics()
Each of the dataset functions has a number of filters that can be applied
This allows you to select certain rows and/or columns
For example, in the above example we had covid_us(state="CA")
. This
instructs the client to only fetch data for counties in the state of
California
Refer to the info
for each dataset’s function for more information on
which filters can be passed
Also, check out the examples section at the end for more examples
NOTE: If a filter is passed to one dataset in the request but is applicable to other datasets in the request, it will be applied to all datasets
For example in cl %>% covid_us(state="CA") %>% demographics()
we only
specify a state
filter on the covid_us
dataset
However, when the data is collected it will also be applied to
demographics
We do this because we end up doing an inner join on all requested
datasets, so when we filter the state in covid_us
they also get filtered
in demographics
Now for the easy part!
When you are ready with your current
To fetch the data, call the fetch
function on the client:
df <- fetch(cl) df names(df)
Notice that after each successful request, the client is reset so there are no “built-up” requests:
cl
Our API is and always will be free for unlimited public use
However, we have an API key system in place to help us understand the needs of our users
We kindly request that you register for an API key so we can understand how to prioritize future work
In order to do so, you can use the register
function
register(cl)
By default, function will prompt you to input an email address
You can also pass the email address as the second argument for non-interactive use
register(cl, "me@me.com")
After you register
for an API key it will be added to the client. All
future requests with this client will use the API key
We also save the key to a file at ~/.covidcountydata/apikey
If this file exists, each time you call client
and do not explicitly
pass an apikey we will read the key from ~/.covidcountydata/apikey
and
automatically apply it for you
Thus, to use the key in future sessions you just need to do cl <-
client()
and we’ll handle the key for you!
Due to the urgency of the COVID-19 crisis and the need for researchers, modelers, and policy makers to have accurate data quickly, this project moves fast!
We have created this library so that as we add new datasets to our backend, they automatically appear here and are accessible via this library
Please check back often and see what has been updated
# Single dataset all cl %>% mobility_devices() %>% fetch()
# Single dataset filter on deaths cl %>% covid_us(location="<100", variable="deaths_total", value=">100") %>% fetch()
# Single dataset single states with all counties # OR: `cl %>% mobility_devices(state=as.integer(48)) %>% fetch()` # OR: `cl %>% mobility_devices(state="TX") %>% fetch()` cl %>% mobility_devices(state="48") %>% fetch()
# Single dataset multiple states with all counties cl %>% mobility_devices(state=c("CA", "TX")) %>% fetch()
# Single dataset variable select cl %>% demographics(variable = c("Total population", "Fraction of population over 65", "Median age")) %>% fetch()
# Multiple datasets all data cl %>% demographics() %>% covid_us(dt=">2020-07-20") %>% fetch()
# Multiple datasets states only cl %>% demographics() %>% covid_us(location="<100", dt=">2020-07-20") %>% fetch()
# Multiple datasets counties only cl %>% demographics() %>% covid_us(location=">1000", dt=">2020-07-20") %>% fetch()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.