View source: R/revision_analysis.R
revision_analysis | R Documentation |
revision_summary
removes all missing values (if requested), and then
computes some basic statistics about the revision behavior of an archive,
returning a tibble summarizing the revisions per time_value+epi_key
features. If print_inform
is true, it prints a concise summary. The
columns returned are:
n_revisions
: the total number of revisions for that entry
min_lag
: the minimum time to any value (if drop_nas=FALSE
, this
includes NA
's)
max_lag
: the amount of time until the final (new) version (same caveat
for drop_nas=FALSE
, though it is far less likely to matter)
min_value
: the minimum value across revisions
max_value
: the maximum value across revisions
median_value
: the median value across revisions
spread
: the difference between the smallest and largest values (this
always excludes NA
values)
rel_spread
: spread
divided by the largest value (so it will
always be less than 1). Note that this need not be the final value. It will
be NA
whenever spread
is 0.
lag_near_latest
: the time taken for the revisions to settle to within
within_latest
(default 20%) of the final value and stay there. For
example, consider the series (0, 20, 99, 150, 102, 100); then
lag_near_latest
is 5, since even though 99 is within 20%, it is outside
the window afterwards at 150.
revision_analysis(
epi_arch,
...,
drop_nas = TRUE,
min_waiting_period = as.difftime(60, units = "days"),
within_latest = 0.2,
compactify = TRUE,
compactify_abs_tol = 0,
return_only_tibble = FALSE
)
## S3 method for class 'revision_analysis'
print(
x,
quick_revision = as.difftime(3, units = "days"),
few_revisions = 3,
abs_spread_threshold = NULL,
rel_spread_threshold = 0.1,
...
)
revision_summary(
epi_arch,
...,
drop_nas = TRUE,
min_waiting_period = as.difftime(60, units = "days"),
within_latest = 0.2,
compactify = TRUE,
compactify_abs_tol = 0,
return_only_tibble = FALSE
)
epi_arch |
an epi_archive to be analyzed |
... |
< |
drop_nas |
bool, drop any |
min_waiting_period |
|
within_latest |
double between 0 and 1. Determines the threshold
used for the |
compactify |
bool. If |
compactify_abs_tol |
length-1 double, used if |
return_only_tibble |
boolean to return only the simple |
x |
a |
quick_revision |
Difftime or integer (integer is treated as days).
The amount of time between the final revision and the
actual time_value to consider the revision quickly resolved. Default of 3
days. This will be rounded up to the appropriate |
few_revisions |
Integer. The upper bound on the number of revisions to consider "few". Default is 3. |
abs_spread_threshold |
Scalar numeric. The maximum spread used to characterize revisions which don't actually change very much. Default is 5% of the maximum value in the dataset, but this is the most unit dependent of values, and likely needs to be chosen appropriate for the scale of the dataset. |
rel_spread_threshold |
Scalar between 0 and 1. The relative spread fraction used to characterize revisions which don't actually change very much. Default is .1, or 10% of the final value |
Applies to epi_archive
s with time_type
s of "day"
, "week"
,
and "yearmonth"
. It can also work with a time_type
of "integer"
if
the possible time_values
are all consecutive integers; you will need to
manually specify the min_waiting_period
and quick_revision
, though.
Using a time_type
of "integer"
with week numbers like 202501 will
produce incorrect results for some calculations, since week numbering
contains jumps at year boundaries.
An S3 object with class revision_behavior
. This function is typically
called for the purposes of inspecting the printed output. The
results of the computations are available in
revision_analysis(...)$revision_behavior
. If you only want to access
the internal computations, use return_only_tibble = TRUE
.
revision_example <- revision_analysis(archive_cases_dv_subset, percent_cli)
revision_example$revision_behavior %>% arrange(desc(spread))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.