revision_summary: A function to describe revision behavior for an archive.

View source: R/revision_analysis.R

revision_summaryR Documentation

A function to describe revision behavior for an archive.

Description

revision_summary removes all missing values (if requested), and then computes some basic statistics about the revision behavior of an archive, returning a tibble summarizing the revisions per time_value+epi_key features. If print_inform is true, it prints a concise summary. The columns returned are:

  1. n_revisions: the total number of revisions for that entry

  2. min_lag: the minimum time to any value (if drop_nas=FALSE, this includes NA's)

  3. max_lag: the amount of time until the final (new) version (same caveat for drop_nas=FALSE, though it is far less likely to matter)

  4. min_value: the minimum value across revisions

  5. max_value: the maximum value across revisions

  6. median_value: the median value across revisions

  7. spread: the difference between the smallest and largest values (this always excludes NA values)

  8. rel_spread: spread divided by the largest value (so it will always be less than 1). Note that this need not be the final value. It will be NA whenever spread is 0.

  9. time_near_latest: the time taken for the revisions to settle to within within_latest (default 20%) of the final value and stay there. For example, consider the series (0, 20, 99, 150, 102, 100); then time_near_latest is 5, since even though 99 is within 20%, it is outside the window afterwards at 150.

Usage

revision_summary(
  epi_arch,
  ...,
  drop_nas = TRUE,
  print_inform = TRUE,
  min_waiting_period = as.difftime(60, units = "days"),
  within_latest = 0.2,
  quick_revision = as.difftime(3, units = "days"),
  few_revisions = 3,
  abs_spread_threshold = NULL,
  rel_spread_threshold = 0.1,
  compactify_tol = .Machine$double.eps^0.5,
  should_compactify = TRUE
)

Arguments

epi_arch

an epi_archive to be analyzed

...

<tidyselect>, used to choose the column to summarize. If empty, it chooses the first. Currently only implemented for one column at a time.

drop_nas

bool, drop any NA values from the archive? After dropping NA's compactify is run again to make sure there are no duplicate values from occasions when the signal is revised to NA, and then back to its immediately-preceding value.

print_inform

bool, determines whether to print summary information, or only return the full summary tibble

min_waiting_period

difftime, integer or NULL. Sets a cutoff: any time_values not earlier than min_waiting_period before versions_end are removed. min_waiting_period should characterize the typical time during which revisions occur. The default of 60 days corresponds to a typical final value for case counts as reported in the context of insurance. To avoid this filtering, either set to NULL or 0.

within_latest

double between 0 and 1. Determines the threshold used for the time_to

quick_revision

difftime or integer (integer is treated as days), for the printed summary, the amount of time between the final revision and the actual time_value to consider the revision quickly resolved. Default of 3 days

few_revisions

integer, for the printed summary, the upper bound on the number of revisions to consider "few". Default is 3.

abs_spread_threshold

numeric, for the printed summary, the maximum spread used to characterize revisions which don't actually change very much. Default is 5% of the maximum value in the dataset, but this is the most unit dependent of values, and likely needs to be chosen appropriate for the scale of the dataset.

rel_spread_threshold

float between 0 and 1, for the printed summary, the relative spread fraction used to characterize revisions which don't actually change very much. Default is .1, or 10% of the final value

compactify_tol

float, used if drop_nas=TRUE, it determines the threshold for when two floats are considered identical.

should_compactify

bool. Compactify if TRUE.

Examples

revision_example <- revision_summary(archive_cases_dv_subset, percent_cli)
revision_example %>% arrange(desc(spread))


cmu-delphi/epiprocess documentation built on Oct. 29, 2024, 5:37 p.m.