aggregate_signals: Aggregate 'covidcast_signal' objects into one data frame

View source: R/wrangle.R

aggregate_signalsR Documentation

Aggregate covidcast_signal objects into one data frame

Description

Aggregates covidcast_signal objects into one data frame, in either "wide" or "long" format. (In "wide" aggregation, only the latest issue from each data frame is retained, and several columns, including data_source and signal are dropped; see details below). See vignette("multi-signals", package = "covidcast") for examples.

Usage

aggregate_signals(x, dt = NULL, format = c("wide", "long"))

Arguments

x

Single covidcast_signal data frame, or a list of such data frames, such as is returned by covidcast_signals().

dt

Vector of shifts to apply to the values in the data frame x. Negative shifts translate into in a lag value and positive shifts into a lead value; for example, if dt = -1, then the value on June 2 that gets reported is the original value on June 1; if dt = 0, then the values are left as is. When x is a list of data frames, dt can either be a single vector of shifts or a list of vectors of shifts, this list having the same length as x (in order to apply, respectively, the same shifts or a different set of shifts to each data frame in x).

format

One of either "wide" or "long". The default is "wide".

Details

This function can be thought of having three use cases. In all three cases, the result will be a new data frame in either "wide" or "long" format, depending on format.

The first use case is to apply time-shifts to the values in a given covidcast_signal object. In this use case, x is a covidcast_signal data frame and dt is a vector of shifts.

The second use case is to bind together, into one data frame, signals that are returned by covidcast_signals(). In this use case, x is a list of covidcast_signal data frames, and dt is NULL.

The third use case is a combination of the first two: to bind together signals returned by covidcast_signals(), and simultaneously, apply time-shifts to their values. In this use case, x is a list of covidcast_signal data frames, and dt is either a vector of shifts—to apply the same shifts for each signal in x, or a list of vector of shifts—to apply different shifts for each signal in x.

Value

Data frame of aggregated signals in "wide" or "long" form, depending on format. In "long" form, an extra column dt is appended to indicate the value of the time-shift. In "wide" form, only the latest issue of data is retained; the returned data frame is formed via full joins of the input data frames (on geo_value and time_value as the join key), and the columns data_source, signal, issue, lag, stderr, sample_size are all dropped from the output. Each unique signal—defined by a combination of data source name, signal name, and time-shift—is given its own column, whose name indicates its defining quantities. For example, the column name "value+2:usa-facts_confirmed_incidence_num" corresponds to a signal defined by data_source = "usa-facts", signal = "confirmed_incidence_num", and dt = 2.

See Also

covidcast_wider(), covidcast_longer()


covidcast documentation built on July 26, 2023, 5:29 p.m.