enw_preprocess_data | R Documentation |
This function preprocesses raw observations under the
assumption they are reported as cumulative counts by a reference and
report date and is used to assign groups. It also constructs data objects
used by visualisation and modelling functions including the
observed empirical probability of a report on a given day, the cumulative
probability of report, the latest available observations, incidence of
observations, and metadata about the date of reference and report (used to
construct models). This function wraps other preprocessing functions that may
be instead used individually if required. Note that internally reports
beyond the user specified delay are dropped for modelling purposes with the
cum_prop_reported
and max_confirm
variables allowing the user to check
the impact this may have (if cum_prop_reported
is significantly below 1 a
longer max_delay
may be appropriate). Also note that if missing reference
or report dates are suspected to occur in your data then these need to be
completed with enw_complete_dates()
.
enw_preprocess_data(
obs,
by = NULL,
max_delay,
timestep = "day",
set_negatives_to_zero = TRUE,
...,
copy = TRUE
)
obs |
A |
by |
A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling |
max_delay |
The maximum number of days to model in the delay distribution. If not specified the maximum observed delay is assumed to be the true maximum delay in the model. Otherwise, an integer greater than or equal to 1 can be specified. Observations with delays larger then the maximum delay will be dropped. If the specified maximum delay is too short, nowcasts can be biased as important parts of the true delay distribution are cut off. At the same time, computational cost scales non-linearly with this setting, so you want the maximum delay to be as long as necessary, but not much longer. Steps to take to determine the maximum delay:
Note that delays are zero indexed and so include the reference date and
|
timestep |
The timestep to used in the process model (i.e. the
reference date model). This can be a string ("day", "week", "month") or a
numeric whole number representing the number of days. If your data does not
have this timestep then you may wish to make use of
|
set_negatives_to_zero |
Logical, defaults to TRUE. Should negative
counts (for calculated incidence of observations) be set to zero? Currently
downstream modelling does not support negative counts and so setting must be
TRUE if intending to use |
... |
Other arguments to |
copy |
A logical; if |
If max_delay
is numeric, it will be internally coerced to integer
using as.integer()
).
A data.table containing processed observations as a series of nested data.frames as well as variables containing metadata. These are:
obs
: (observations with the addition of empirical reporting proportions
and restricted to the specified maximum delay).
new_confirm
: Incidence of notifications by reference and report date.
Empirical reporting distributions are also added.
latest
: The latest available observations.
missing_reference
: Observations missing reference dates.
reporting_triangle
: Incident observations by report and reference date in
the standard reporting triangle matrix format.
metareference
: Metadata reference dates derived from observations.
metrareport
: Metadata for report dates.
metadelay
: Metadata for reporting delays produced using
enw_metadata_delay()
.
max_delay
: Maximum delay to be modelled by epinowcast.
time
: Numeric, number of timepoints in the data.
snapshots
: Numeric, number of available data snapshots to use for
nowcasting.
groups
: Numeric, Number of groups/strata in the supplied observations
(set using by
).
max_date
: The maximum available report date.
Preprocessing functions
enw_add_delay()
,
enw_add_max_reported()
,
enw_add_metaobs_features()
,
enw_assign_group()
,
enw_complete_dates()
,
enw_construct_data()
,
enw_extend_date()
,
enw_filter_delay()
,
enw_filter_reference_dates()
,
enw_filter_report_dates()
,
enw_flag_observed_observations()
,
enw_impute_na_observations()
,
enw_latest_data()
,
enw_metadata()
,
enw_metadata_delay()
,
enw_missing_reference()
,
enw_reporting_triangle()
,
enw_reporting_triangle_to_long()
library(data.table)
# Filter example hospitalisation data to be national and over all ages
nat_germany_hosp <- germany_covid19_hosp[location == "DE"]
nat_germany_hosp <- nat_germany_hosp[age_group == "00+"]
# Preprocess with default settings
pobs <- enw_preprocess_data(nat_germany_hosp)
pobs
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.