knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message=FALSE, warning=FALSE, eval = nzchar(Sys.getenv("IS_DEVELOPMENT_MACHINE")) )
zoltr is an R package that simplifies access to the zoltardata.com API. This vignette takes you through the package's main features. So that you can experiment without needing a Zoltar account, we use the example project from docs.zoltardata.com, which should always be available for public read-only access.
You need to have an account on Zoltar and be authenticated to the server in order to access data from the API. Once you have an account, we recommend storing your Zoltar username and password in your .Renviron file. In practice this means having a file named .Renviron
in your home directory. (You can read more about R and environment variables here.) The lines of code in this vignette will work if you have the following two lines somewhere in your .Renviron
file (where you replace your username and password in the appropriate locations). Note there is no space around the =
sign:
Z_USERNAME=insert-your-username-here Z_PASSWORD=insert-your-password-here
Note that the Zoltar service uses a "token"-based scheme for authentication. These tokens have a five-minute expiration for security, which requires re-authentication after that period of time. The zoltr library takes care of re-authenticating as needed by passing your username and password back to the server to get another token. Note that the connection object returned by the new_connection
function stores a token internally, so be careful if saving that object into a file.
The starting point for working with Zoltar's API is a ZoltarConnection
object, obtained via the new_connection
function. Most zoltr functions take a ZoltarConnection
along with the API URL of the thing of interest, e.g., a project, model, or forecast. API URLs look like https://www.zoltardata.com/api/project/3/
, which is that of the "Docs Example Project". An important note regarding URLs:
zoltr's convention for URLs is to require a trailing slash character ('/') on all URLs. The only exception is the optional `host` parameter passed to `new_connection()`. Thus, `https://www.zoltardata.com/api/project/3/` is valid, but `https://www.zoltardata.com/api/project/3` is not.
You can obtain a URL using some of the *_info
functions, and you can always use the web interface to navigate to the item of interest and look at its URL in the browser address field. Keep in mind that you'll need to add api
to the browsable address, along with the trailing slash character. For example, if you browsed the Docs Example Project project at (say) https://www.zoltardata.com/project/3
then its API for use in zoltr would be https://www.zoltardata.com/api/project/3/
.
library(httr) # o/w devtools::check() gets `could not find function "POST"` error
library(zoltr) zoltar_connection <- new_connection(host = Sys.getenv("Z_HOST")) zoltar_authenticate(zoltar_connection, Sys.getenv("Z_USERNAME"), Sys.getenv("Z_PASSWORD")) zoltar_connection
library(zoltr) zoltar_connection <- new_connection() zoltar_authenticate(zoltar_connection, Sys.getenv("Z_USERNAME"), Sys.getenv("Z_PASSWORD")) zoltar_connection
Now that you have a connection, you can use the projects()
function to get all projects as a data.frame
. Note that it will only list those that you are authorized to access, i.e., all public projects plus any private ones that you own or are a model owner.
the_projects <- projects(zoltar_connection) str(the_projects)
Let's start by getting a public project to work with. We will search the projects list for it by name. Then we will pass its URL to the project_info()
function to get a list
of details, and then pass it to the models()
function to get a data.frame
of its models.
project_url <- the_projects[the_projects$name == "Docs Example Project", "url"] the_project_info <- project_info(zoltar_connection, project_url) names(the_project_info) the_project_info$description the_models <- models(zoltar_connection, project_url) str(the_models)
There is other project-related information that you can access, such as its configuration (zoltar_units()
, targets()
, and timezeros()
- concepts that are explained at docs.zoltardata.com - and truth()
You can query a project's forecast data using the submit_query()
function. Keep in mind that Zoltar enqueues long operations like querying and uploading forecasts, which keeps the site responsive but makes the Zoltar API a little more complicated. Rather than having the submit_query()
function block until the query is done, you instead get a quick response in the form of a Job
URL that you can pass to the job_info()
function to check its status and find out if the upload is pending, successfully finished, or failed. (This is called polling the host to ask the status.) Here we poll every second using the busy_poll_job()
helper function. Then we use the job_data()
function when the query is successfully completed to get the results as a data.frame
.
Note: You may find the
do_zoltar_query()
function helpful, which combinessubmit_query()
,busy_poll_job()
, andjob_data()
in one call.
Putting it together, we'll show the long way to do it (for reference) but use do_zoltar_query()
to actually run the example:
query <- list("targets" = list("pct next week", "cases next week"), "types" = list("point")) job_url <- submit_query(zoltar_connection, project_url, "forecasts", query) busy_poll_job(zoltar_connection, job_url) the_job_data <- job_data(zoltar_connection, job_url) the_job_data
forecast_data <- do_zoltar_query(zoltar_connection, project_url, "forecasts", "docs_mod", c("loc1", "loc2"), c("pct next week", "cases next week"), c("2011-10-02", "2011-10-09", "2011-10-16"), types = c("point", "quantile")) forecast_data
Hopefully you'll see "SUCCESS" eventually printed and then the resulting data itself.
Note: Zoltar returns a 404 Not Found error if
job_data()
is called on a Job that has no underlying data file (Zoltar saves query results as temporary files on the server). This can happen for two reasons: 1) 24 hours has passed (the expiration time for temporary files) or 2) the Job is not complete and therefore there is no data file yet. As noted above, you can avoid the latter condition by usingbusy_poll_job()
to ensure the job is done.Note: Zoltar limits the number of rows a query can return, giving you an error if they are exceeded. The job's failure message will indicate whether this has happened.
Similarly, querying truth is done by passing a query_type
of "truth"
. Further, only the units
, targets
, timezeros
, and as_of
args are allowed:
truth_data <- do_zoltar_query(zoltar_connection, project_url, "truth", NULL, c("loc1", "loc2"), c("pct next week", "cases next week"), c("2011-10-02", "2011-10-09", "2011-10-16"), "2020-12-18 12:00:00 UTC") truth_data
This is a somewhat specialized function that returns the ID
and source
of the latest versions of a project's forecasts. (Later we may generalize to allow passing specific columns to retrieve, such as 'forecast_model_id', 'time_zero_id', 'issued_at', 'created_at', 'source', and 'notes'.)
the_latest_forecasts <- latest_forecasts(zoltar_connection, project_url) the_latest_forecasts
Now let's work with a particular model, getting its URL by name and then passing it to the model_info()
function to get details. Then use the forecasts()
function to get a data.frame
of that model's forecasts (there is only one). Note that obtaining the model's URL is straightforward because it is provided in the url
column of the_models
.
model_url <- the_models[the_models$name == "docs forecast model", "url"] the_model_info <- model_info(zoltar_connection, model_url) names(the_model_info) the_model_info$name the_forecasts <- forecasts(zoltar_connection, model_url) str(the_forecasts)
You can get forecast data using the download_forecast()
function, which returns a nested list
format that corresponds to Zoltar's native JSON one. That format can be converted to a CSV-friendly data.frame
via data_frame_from_forecast_data()
, which can represent all prediction types, or quantile_data_frame_from_forecast_data()
for users who are mainly interested in point
and quantile
data. Please see docs.zoltardata.com for forecast format details.
forecast_url <- the_forecasts[1, "url"] forecast_info <- forecast_info(zoltar_connection, forecast_url) forecast_data <- download_forecast(zoltar_connection, forecast_url) length(forecast_data$predictions)
As a data.frame
:
forecast_data_frame <- data_frame_from_forecast_data(forecast_data) str(forecast_data_frame)
And just quantile data:
forecast_data_frame <- quantile_data_frame_from_forecast_data(forecast_data) str(forecast_data_frame)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.