covidcast_signal: Obtain a data frame for one COVIDcast signal

View source: R/covidcast.R

covidcast_signalR Documentation

Obtain a data frame for one COVIDcast signal

Description

Obtains data for selected date ranges for all geographic regions of the United States. Available data sources and signals are documented in the COVIDcast signal documentation. Most (but not all) data sources are available at the county level, but the API can also return data aggregated to metropolitan statistical areas, hospital referral regions, or states, as desired, by using the geo_type argument.

Usage

covidcast_signal(
  data_source,
  signal,
  start_day = NULL,
  end_day = NULL,
  geo_type = c("county", "hrr", "msa", "dma", "state", "hhs", "nation"),
  geo_values = "*",
  as_of = NULL,
  issues = NULL,
  lag = NULL,
  time_type = c("day", "week")
)

Arguments

data_source

String identifying the data source to query. See https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html for a list of available data sources.

signal

String identifying the signal from that source to query. Again, see https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html for a list of available signals.

start_day

Query data beginning on this date. Date object, or string in the form "YYYY-MM-DD". If start_day is NULL, defaults to first day data is available for this signal.

end_day

Query data up to this date, inclusive. Date object or string in the form "YYYY-MM-DD". If end_day is NULL, defaults to the most recent day data is available for this signal.

geo_type

The geography type for which to request this data, such as "county" or "state". Defaults to "county". See https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html for details on which types are available.

geo_values

Which geographies to return. The default, "*", fetches all geographies. To fetch specific geographies, specify their IDs as a vector or list of strings. See https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html for details on how to specify these IDs.

as_of

Fetch only data that was available on or before this date, provided as a Date object or string in the form "YYYY-MM-DD". If NULL, the default, return the most recent available data. Note that only one of as_of, issues, and lag should be provided; it does not make sense to specify more than one. For more on data revisions, see "Issue dates and revisions" below.

issues

Fetch only data that was published or updated ("issued") on these dates. Provided as either a single Date object (or string in the form "YYYY-MM-DD"), indicating a single date to fetch data issued on, or a vector specifying two dates, start and end. In this case, return all data issued in this range. There may be multiple rows for each observation, indicating several updates to its value. If NULL, the default, return the most recently issued data.

lag

Integer. If, for example, lag = 3, then we fetch only data that was published or updated exactly 3 days after the date. For example, a row with time_value of June 3 will only be included in the results if its data was issued or updated on June 6. If NULL, the default, return the most recently issued data regardless of its lag.

time_type

The temporal resolution to request this data. Most signals are available at the "day" resolution (the default); some are only available at the "week" resolution, representing an MMWR week ("epiweek").

Details

For data on counties, metropolitan statistical areas, and states, this package provides the county_census, msa_census, and state_census datasets. These include each area's unique identifier, used in the geo_values argument to select specific areas, and basic information on population and other Census data.

Downloading large amounts of data may be slow, so this function prints messages for each chunk of data it downloads. To suppress these, use base::suppressMessages(), as in suppressMessages(covidcast_signal("fb-survey", ...)).

Value

covidcast_signal object with matching data. The object is a data frame with additional metadata attached. Each row is one observation of one signal on one day in one geographic location. Contains the following columns:

data_source

Data source from which this observation was obtained.

signal

Signal from which this observation was obtained.

geo_value

String identifying the location, such as a state name or county FIPS code.

time_value

Date object identifying the date of this observation. For data with time_type = "week", this is the first day of the corresponding epiweek.

issue

Date object identifying the date this estimate was issued. For example, an estimate with a time_value of June 3 might have been issued on June 5, after the data for June 3rd was collected and ingested into the API.

lag

Integer giving the difference between issue and time_value, in days.

value

Signal value being requested. For example, in a query for the "confirmed_cumulative_num" signal from the "usa-facts" source, this would be the cumulative number of confirmed cases in the area, as of the given time_value.

stderr

Associated standard error of the signal value, if available.

sample_size

Integer indicating the sample size available in that geography on that day; sample size may not be available for all signals, due to privacy or other constraints, in which case it will be NA.

Consult the signal documentation for more details on how values and standard errors are calculated for specific signals.

The returned data frame has a metadata attribute containing metadata about the signal contained within; see "Metadata" below for details.

Metadata

The returned object has a metadata attribute attached containing basic information about the signal. Use attributes(x)$metadata to access this metadata. The metadata is stored as a data frame of one row, and contains the same information that covidcast_meta() would return for a given signal.

Note that not all covidcast_signal objects may have all fields of metadata attached; for example, an object created with as.covidcast_signal() using data from another source may only contain the geo_type variable, along with data_source and signal. Before using the metadata of a covidcast_signal object, always check for the presence of the attributes you need.

Issue dates and revisions

The COVIDcast API tracks updates and changes to its underlying data, and records the first date each observation became available. For example, a data source may report its estimate for a specific state on June 3rd on June 5th, once records become available. This data is considered "issued" on June 5th. Later, the data source may update its estimate for June 3rd based on revised data, creating a new issue on June 8th. By default, covidcast_signal() returns the most recent issue available for every observation. The as_of, issues, and lag parameters allow the user to select specific issues instead, or to see all updates to observations. These options are mutually exclusive, and you should only specify one; if you specify more than one, you may get an error or confusing results.

Note that the API only tracks the initial value of an estimate and changes to that value. If a value was first issued on June 5th and never updated, asking for data issued on June 6th (using issues or lag) would not return that value, though asking for data as_of June 6th would. See vignette("covidcast") for examples.

Note also that the API enforces a maximum result row limit; results beyond the maximum limit are truncated. This limit is sufficient to fetch observations in all counties in the United States on one day. This client automatically splits queries for multiple days across multiple API calls. However, if data for one day has been issued many times, using the issues argument may return more results than the query limit. A warning will be issued in this case. To see all results, split your query across multiple calls with different issues arguments.

API keys

By default, covidcast_signal() submits queries to the API anonymously. All the examples in the package documentation are compatible with anonymous use of the API, but there are some limits on anonymous queries, including a rate limit. If you regularly query large amounts of data, please consider registering for a free API key, which lifts these limits. Even if your usage falls within the anonymous usage limits, registration helps us understand who and how others are using the Delphi Epidata API, which may in turn inform future research, data partnerships, and funding.

If you have an API key, you can use it by setting the covidcast.auth option once before calling covidcast_signal() or covidcast_signals():

options(covidcast.auth = "your_api_key")

cli <- covidcast_signal(data_source = "fb-survey", signal = "smoothed_cli",
                        start_day = "2020-05-01", end_day = "2020-05-07",
                        geo_type = "state")

References

COVIDcast API documentation: https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html

Documentation of all COVIDcast sources and signals: https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html

COVIDcast public dashboard: https://delphi.cmu.edu/covidcast/

See Also

plot.covidcast_signal(), covidcast_signals(), as.covidcast_signal(), county_census, msa_census, state_census

Examples

## Not run: 
## Fetch all counties from 2020-05-10 to the most recent available data
covidcast_signal("fb-survey", "smoothed_cli", start_day = "2020-05-10")
## Fetch all counties on just 2020-05-10 and no other days
covidcast_signal("fb-survey", "smoothed_cli", start_day = "2020-05-10",
                 end_day = "2020-05-10")
## Fetch all states on 2020-05-10, 2020-05-11, 2020-05-12
covidcast_signal("fb-survey", "smoothed_cli", start_day = "2020-05-10",
                 end_day = "2020-05-12", geo_type = "state")
## Fetch all available data for just Pennsylvania and New Jersey
covidcast_signal("fb-survey", "smoothed_cli", geo_type = "state",
                 geo_values = c("pa", "nj"))
## Fetch all available data in the Pittsburgh metropolitan area
covidcast_signal("fb-survey", "smoothed_cli", geo_type = "msa",
                 geo_values = name_to_cbsa("Pittsburgh"))

## End(Not run)


covidcast documentation built on July 26, 2023, 5:29 p.m.