epi_archive | R Documentation |
epi_archive
objectThe second main data structure for storing time series in
epiprocess
. It is similar to epi_df
in that it fundamentally a table with
a few required columns that stores epidemiological time series data. An
epi_archive
requires a geo_value
, time_value
, and version
column (and
possibly other key columns) along with measurement values. In brief, an
epi_archive
is a history of the time series data, where the version
column tracks the time at which the data was available. This allows for
version-aware forecasting.
new_epi_archive
is the constructor for epi_archive
objects that assumes
all arguments have been validated. Most users should use as_epi_archive
.
new_epi_archive(
x,
geo_type,
time_type,
other_keys,
compactify,
clobberable_versions_start,
versions_end,
compactify_tol = .Machine$double.eps^0.5
)
validate_epi_archive(
x,
other_keys,
compactify,
clobberable_versions_start,
versions_end
)
as_epi_archive(
x,
geo_type = deprecated(),
time_type = deprecated(),
other_keys = character(),
compactify = NULL,
clobberable_versions_start = NA,
.versions_end = max_version_with_row_in(x),
...,
versions_end = .versions_end
)
x |
A data.frame, data.table, or tibble, with columns |
geo_type |
DEPRECATED Has no effect. Geo value type is inferred from the location column and set to "custom" if not recognized. |
time_type |
DEPRECATED Has no effect. Time value type inferred from the time column and set to "custom" if not recognized. Unpredictable behavior may result if the time type is not recognized. |
other_keys |
Character vector specifying the names of variables in |
compactify |
Optional; Boolean. |
clobberable_versions_start |
Optional; |
versions_end |
Optional; length-1, same |
compactify_tol |
double. the tolerance used to detect approximate equality for compactification |
.versions_end |
location based versions_end, used to avoid prefix
|
... |
used for specifying column names, as in |
An epi_archive
contains a data.table
object DT
(from the
{data.table}
package), with (at least) the following columns:
geo_value
: the geographic value associated with each row of measurements,
time_value
: the time value associated with each row of measurements,
version
: the time value specifying the version for each row of
measurements. For example, if in a given row the version
is January 15,
2022 and time_value
is January 14, 2022, then this row contains the
measurements of the data for January 14, 2022 that were available one day
later.
The variables geo_value
, time_value
, version
serve as key variables for
the data table (in addition to any other keys specified in the metadata).
There can only be a single row per unique combination of key variables. The
keys for an epi_archive
can be viewed with key(epi_archive$DT)
.
By default, an epi_archive
will compactify the data table to remove
redundant rows. This is done by not storing rows that have the same value,
except for the version
column (this is essentially a last observation
carried forward, but along the version index). This is done to save space and
improve performance. If you do not want to compactify the data, you can set
compactify = FALSE
in as_epi_archive()
.
Note that in some data scenarios, LOCF may not be appropriate. For instance,
if you expected data to be updated on a given day, but your data source did
not update, then it could be reasonable to code the data as NA
for that
day, instead of assuming LOCF.
NA
s can be introduced by epi_archive
methods for other
reasons, e.g., in epix_fill_through_version
and epix_merge
, if
requested, to represent potential update data that we do not yet have access
to; or in epix_merge
to represent the "value" of an observation before
the version in which it was first released, or if no version of that
observation appears in the archive data at all.
The following pieces of metadata are included as fields in an epi_archive
object:
geo_type
: the type for the geo values.
time_type
: the type for the time values.
other_keys
: any additional keys as a character vector.
Typical examples are "age" or sub-geographies.
While this metadata is not protected, it is generally recommended to treat it
as read-only, and to use the epi_archive
methods to interact with the data
archive. Unexpected behavior may result from modifying the metadata
directly.
An epi_archive
object.
epix_as_of
epix_merge
epix_slide
# Simple ex. with necessary keys
tib <- tibble::tibble(
geo_value = rep(c("ca", "hi"), each = 5),
time_value = rep(seq(as.Date("2020-01-01"),
by = 1, length.out = 5
), times = 2),
version = rep(seq(as.Date("2020-01-02"),
by = 1, length.out = 5
), times = 2),
value = rnorm(10, mean = 2, sd = 1)
)
toy_epi_archive <- tib %>% as_epi_archive()
toy_epi_archive
# Ex. with an additional key for county
df <- data.frame(
geo_value = c(replicate(2, "ca"), replicate(2, "fl")),
county = c(1, 3, 2, 5),
time_value = c(
"2020-06-01",
"2020-06-02",
"2020-06-01",
"2020-06-02"
),
version = c(
"2020-06-02",
"2020-06-03",
"2020-06-02",
"2020-06-03"
),
cases = c(1, 2, 3, 4),
cases_rate = c(0.01, 0.02, 0.01, 0.05)
)
x <- df %>% as_epi_archive(other_keys = "county")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.