inspect.data: Audit input data for common issues

View source: R/inspect.data.R

inspect.dataR Documentation

Audit input data for common issues

Description

Inspect input data and look for common mistakes. The function does not return any object but yields warnings wupon detecting possible inconsistencies, along with suggestions as to how to clean inputs before running estimation routines.

Usage

inspect.data(incid, GT = NULL, t = NULL)

Arguments

incid

An object (vector, data.frame, list) storing incidence.

GT

Generation time distribution from generation.time().

t

Vector of dates at which incidence was observed (optional).

Details

inspect.data() looks for common issues that could affect estimation routines. Such issues include too low incidence counts, leading/trailing zeros, non-integer values...

Before any checks are conducted, the data are passed to check.incid() to try and guess the format of the data.

A not-so-uncommon issue is to provide non-integer counts for incidence, for example when working with aggregated data that represent averaged number of cases across different communities. This however does not agree well with parametric likelihood that assume exponential growth over the early stage of an epidemic or Poisson distribution of cases, where non-integer values will cause calculations to fail.

Missing values may cause issues if not handled properly. By default, check.incid() will recast missing values to zero. Leading and trailing NA's should be omitted entirely from the input. Gaps found between available data may also cause issues if they span over a period that's longer than the total generation time. A warning is raised to inform on these possible issues.

Likewise, leading and tailing zeros would cause similar issues. Begin will default to the first value and end to the peak one. Just in case, these will be inspected here too. Sequence of 0s exceeding the length of the generation time will also yield a warning.

Scarce data may also cause errors when optimizing likelihood functions. A time-series of incidence spanning for a duration shorter than that of the generation time distribution is likely to correspond to an index case that hasn't yet infected all its offsprings. This would biais estimates downwards and should be taken into account when interpreting results.

Value

No object is returned. Instead, warnings are thrown upon detecting inconsistences.

Author(s)

Pierre-Yves Boelle, Thomas Obadia


tobadia/R0 documentation built on Sept. 24, 2023, 5:16 p.m.