epidata: Continuous-Time SIR Event History of a Fixed Population
In jimhester/surveillance: Temporal and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena

Description Usage Arguments Details Value Note Author(s) See Also Examples

The function as.epidata is used to generate objects of class "epidata". Objects of this class are specific data frames containing the event history of an epidemic together with some additional attributes. These objects are the basis for fitting spatio-temporal epidemic intensity models with the function twinSIR. Note that the spatial information itself, i.e. the positions of the individuals, is assumed to be constant over time. Besides epidemics following the SIR compartmental model, also data from SI, SIRS and SIS epidemics may be supplied. Inference for the infectious process works as usual and simulation of such epidemics is also possible.

as.epidata(data, ...)

## S3 method for class 'data.frame'
as.epidata(data, t0,
           tE.col, tI.col, tR.col, id.col, coords.cols,
           f = list(), w = list(), D = dist, keep.cols = TRUE, ...)
## Default S3 method:
as.epidata(data, id.col, start.col, stop.col,
           atRiskY.col, event.col, Revent.col, coords.cols,
           f = list(), w = list(), D = dist, ...)

## S3 method for class 'epidata'
print(x, ...)
## S3 method for class 'epidata'
x[i, j, drop]
## S3 method for class 'epidata'
update(object, f = list(), w = list(), D = dist, ...)

`data`	For the `data.frame`-method, a data frame with as many rows as there are individuals in the population and time columns indicating when each individual became exposed (optional), infectious (mandatory, but can be `NA` for non-affected individuals) and removed (optional). Note that this data format does not allow for re-infection (SIRS) and time-varying covariates. The `data.frame`-method converts the individual-indexed data frame to the long event history start/stop format and then feeds it into the default method. If calling the generic function `as.epidata` on a `data.frame` and the `t0` argument is missing, the default method is called directly. For the default method, `data` can be a `matrix` or a `data.frame`. It must contain the observed event history in a form similar to `Surv(, type="counting")` with additional information (variables) along the process. Rows will be sorted automatically during conversion. The observation period is splitted up into consecutive intervals of constant state - thus constant infection intensities. The data frame consists of a block of N (number of individuals) rows for each of those time intervals (all rows in a block have the same start and stop values... therefore the name “block”), where there is one row per individual in the block. Each row describes the (fixed) state of the individual during the interval given by the start and stop columns `start.col` and `stop.col`. Note that there may not be more than one event (infection or removal) in a single block. Thus, in a single block, only one entry in the `event.col` and `Revent.col` may be 1, all others are 0. This rule follows the point process characteristic that there are no concurrent events (infections or removals).
`t0`	start time of the observation period. Will be subtracted from the time columns `tE.col`, `tI.col`, `tR.col`. Individuals that have already been removed prior to `t0`, i.e., rows with `tR <= t0`, will be dropped.
`tE.col, tI.col, tR.col`	single numeric or character indexes of the time columns in `data`, which specify when the individuals became exposed, infectious and removed, respectively. `tE.col` and `tR.col` can be missing, corresponding to SIR, SEI, or SI data. `NA` entries mean that the respective event has not (yet) occurred. Note that `is.na(tE)` implies `is.na(tI)` and `is.na(tR)`, and `is.na(tI)` implies `is.na(tR)` (and this is checked for the provided data).
`id.col`	single numeric or character index of the `id` column in `data`. The `id` column identifies the individuals in the data frame. It is converted to a factor by calling `factor`, i.e., unused levels are dropped if it already was a factor.
`start.col`	single index of the `start` column in `data`. Can be numeric (by column number) or character (by column name). The `start` column contains the (numeric) time points of the beginnings of the consecutive time intervals of the event history. The minimum value in this column, i.e. the start of the observation period should be 0.
`stop.col`	single index of the `stop` column in `data`. Can be numeric (by column number) or character (by column name). The `stop` column contains the (numeric) time points of the ends of the consecutive time intervals of the event history. The stop value must always be greater than the start value of a row.
`atRiskY.col`	single index of the `atRiskY` column in `data`. Can be numeric (by column number) or character (by column name). The `atRiskY` column indicates if the individual was “at-risk” of becoming infected during the time interval (start; stop]. This variable must be logical or in 0/1-coding. Individuals with `atRiskY == 0` in the first time interval (normally the rows with `start == 0`) are taken as initially infectious.
`event.col`	single index of the `event` column in `data`. Can be numeric (by column number) or character (by column name). The `event` column indicates if the individual became infected at the `stop` time of the interval. This variable must be logical or in 0/1-coding.
`Revent.col`	single index of the `Revent` column in `data`. Can be numeric (by column number) or character (by column name). The `Revent` column indicates if the individual was recovered at the `stop` time of the interval. This variable must be logical or in 0/1-coding.
`coords.cols`	indexes of the `coords` columns in `data`. Can be numeric (by column number), character (by column name), or `NULL` (no coordinates, e.g., if `D` is a pre-specified distance matrix). These columns contain the individuals' coordinates, which determine the distance matrix for the distance-based components of the force of infection (see argument `f`). By default, Euclidean distance is used (see argument `D`). Note that the functions related to `twinSIR` currently assume fixed positions of the individuals during the whole epidemic. Thus, an individual has the same coordinates in every block. For simplicity, the coordinates are derived from the first time block only (normally the rows with `start == 0`). The `animate`-method requires coordinates.
`f`	a named list of vectorized functions for a distance-based force of infection. The functions must interact elementwise on a (distance) matrix `D` so that `f[[m]](D)` results in a matrix. A simple example is `function(u) {u <= 1}`, which indicates if the Euclidean distance between the individuals is smaller than or equal to 1. The names of the functions determine the names of the epidemic variables in the resulting data frame. So, the names should not coincide with names of other covariates. The distance-based weights are computed as follows: Let I(t) denote the set of infectious individuals just before time t. Then, for individual i at time t, the m'th covariate has the value ∑_{j in I(t)} f[[m]](d[i,j]), where d[i,j] denotes entries of the distance matrix (by default this is the Euclidean distance \|\|s_i - s_j\|\| between the individuals' coordinates, but see argument `D`).
`w`	a named list of vectorized functions for extra covariate-based weights w_ij in the epidemic component. Each function operates on a single time-constant covariate in `data`, which is determined by the name of the first argument: The two function arguments should be named `varname.i` and `varname.j`, where `varname` is one of `names(data)`. Similar to the components in `f`, `length(w)` epidemic covariates will be generated in the resulting `"epidata"` named according to `names(w)`. So, the names should not coincide with names of other covariates. For individual i at time t, the m'th such covariate has the value ∑_{j \in I(t)} w_m(z^{(m)}_i, z^{(m)}_j), where z^{(m)} denotes the variable in `data` associated with `w[[m]]`.
`D`	either a function to calculate the distances between the individuals with locations taken from `coord.cols` (the default is Euclidean distance via the function `dist`) and the result converted to a matrix via `as.matrix`, or a pre-computed distance matrix with `dimnames` containing the individual ids.
`keep.cols`	logical indicating if all columns in `data` should be retained (and not only the obligatory `"epidata"` columns), in particular any additional columns with time-constant individual-specific covariates. Alternatively, `keep.cols` can be a numeric or character vector indexing columns of `data` to keep.
`x,object`	an object of class `"epidata"`.
`...`	arguments passed to `print.data.frame`. Currently unused in the `as.epidata`-methods.
`i,j,drop`	arguments passed to `[.data.frame`.

The print method for objects of class "epidata" simply prints the data frame with a small header containing the time range of the observed epidemic and the number of infected individuals. Usually, the data frames are quite long, so the summary method summary.epidata might be useful. Also, indexing/subsetting "epidata" works exactly as for data.frames, but there is an own method, which assures consistency of the resulting "epidata" or drops this class, if necessary. The update-method can be used to add or replace distance-based (f) or covariate-based (w) epidemic variables in an existing "epidata" object.

SIS epidemics are implemented as SIRS epidemics where the length of the removal period equals 0. This means that an individual, which has an R-event will be at risk immediately afterwards, i.e. in the following time block. Therefore, data of SIS epidemics have to be provided in that form containing “pseudo-R-events”.

a data.frame with the columns "BLOCK", "id", "start", "stop", "atRiskY", "event", "Revent" and the coordinate columns (with the original names from data), which are all obligatory. These columns are followed by any remaining columns of the input data. Last but not least, the newly generated columns with epidemic variables corresponding to the functions in the list f are appended, if length(f) > 0.

The data.frame is given the additional attributes

`"eventTimes"`	numeric vector of infection time points (sorted chronologically).
`"timeRange"`	numeric vector of length 2: `c(min(start), max(stop))`.
`"coords.cols"`	numeric vector containing the column indices of the coordinate columns in the resulting data frame.
`"f"`	this equals the argument `f`.
`"w"`	this equals the argument `w`.

The column name "BLOCK" is a reserved name. This column will be added automatically at conversion and the resulting data frame will be sorted by this column and by id. Also the names "id", "start", "stop", "atRiskY", "event" and "Revent" are reserved for the respective columns only.

Sebastian Meyer

The hagelloch data for a “real” "epidata" object. The code for the conversion from the simple data frame to the SIR event history using as.epidata.data.frame is given in example(hagelloch).

The plot and the summary method for class "epidata". Furthermore, the function animate.epidata for the animation of epidemics.

Function twinSIR for fitting spatio-temporal epidemic intensity models to epidemic data.

Function simEpidata for the simulation of epidemic data.

# see example(hagelloch)

# here is an artificial event history
data("foodata")
str(foodata)

# convert the data to an object of class "epidata",
# also generating some epidemic covariates
myEpidata <- as.epidata(foodata,
  id.col = 1, start.col = "start", stop.col = "stop",
  atRiskY.col = "atrisk", event.col = "infected", Revent.col = "removed",
  coords.cols = c("x","y"),
  f = list(B1 = function(u) u <= 1, B2 = function(u) u > 1))

# this is how data("fooepidata") has been generated
data("fooepidata")
stopifnot(all.equal(myEpidata, fooepidata))

# add covariate-based weight for the force of infection, e.g.,
# to model an increased force if i and j have the same value in z1
myEpidata2 <- update(fooepidata,
                     w = list(samez1 = function(z1.i, z1.j) z1.i == z1.j))

str(fooepidata)
subset(fooepidata, BLOCK == 1)

summary(fooepidata)          # see 'summary.epidata'
plot(fooepidata)             # see 'plot.epidata' and also 'animate.epidata'
stateplot(fooepidata, "15")  # see 'stateplot'