epidataCS: Continuous Space-Time Marked Point Patterns with Grid-Based...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

Data structure for continuous spatio-temporal event data, e.g. individual case reports of an infectious disease. Apart from the actual events, the class simultaneously holds a spatio-temporal grid of endemic covariates (similar to disease mapping) and a representation of the observation region.

The "epidataCS" class is the basis for fitting spatio-temporal epidemic intensity models with the function twinstim.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
as.epidataCS(events, stgrid, W, qmatrix = diag(nTypes),
             nCircle2Poly = 32L, T = NULL,
             clipper = c("polyclip", "rgeos"), verbose = interactive())

## S3 method for class 'epidataCS'
print(x, n = 6L, digits = getOption("digits"), ...)

## S3 method for class 'epidataCS'
nobs(object, ...)
## S3 method for class 'epidataCS'
head(x, n = 6L, ...)
## S3 method for class 'epidataCS'
tail(x, n = 6L, ...)
## S3 method for class 'epidataCS'
x[i, j, ..., drop = TRUE]
## S3 method for class 'epidataCS'
subset(x, subset, select, drop = TRUE, ...)

## S3 method for class 'epidataCS'
marks(x, coords = TRUE, ...)

## S3 method for class 'epidataCS'
summary(object, ...)
## S3 method for class 'summary.epidataCS'
print(x, ...)

## S3 method for class 'epidataCS'
as.stepfun(x, ...)

Arguments

events

a "SpatialPointsDataFrame" of cases with the following obligatory columns (in the events@data data.frame):

time

time point of event. Will be converted to a numeric variable by as.numeric. There should be no concurrent events (but see untie for an ex post adjustment) and the event times must be covered by stgrid, i.e. belong to the time interval (t_0,T], where t_0 is min(stgrid$start) and T is described below.

tile

the spatial region (tile) where the event is located. This links to the tiles of stgrid.

type

optional type of event in a marked twinstim model. Will be converted to a factor variable dropping unused levels. If missing, all events will be attribute the single type "1".

eps.t

maximum temporal influence radius (e.g. length of infectious period, time to culling, etc.); must be positive and may be Inf.

eps.s

maximum spatial influence radius (e.g. 100 [km]); must be positive and may be Inf. A compact influence region mainly has computational advantages, but might also be plausible for specific applications.

The data.frame may contain columns with further marks of the events, e.g. sex, age of infected individuals, which may be used as epidemic covariates influencing infectiousness. Note that some auxiliary columns will be added at conversion whose names are reserved: ".obsInfLength", ".bdist", ".influenceRegion", and ".sources", as well as "start", "BLOCK", and all endemic covariates' names from stgrid.

stgrid

a data.frame describing endemic covariates on a full spatio-temporal region x interval grid (e.g., district x week), which is a decomposition of the observation region W and period t_0,T. This means that for every combination of spatial region and time interval there must be exactly one row in this data.frame, that the union of the spatial tiles equals W, the union of the time intervals equals t_0,T, and that regions (and intervals) are non-overlapping. There are the following obligatory columns:

tile

ID of the spatial region (e.g., district ID). It will be converted to a factor variable (dropping unused levels if it already was one).

start, stop

columns describing the consecutive temporal intervals (converted to numeric variables by as.numeric). The start time of an interval must be equal to the stop time of the previous interval. The stop column may be missing, in which case it will be auto-generated from the set of start values and T.

area

area of the spatial region (tile). Be aware that the unit of this area (e.g., square km) must be consistent with the units of W and events (as specified in their proj4strings, if they have projected coordinates).

The remaining columns are endemic covariates. Note that the column name "BLOCK" is reserved (a column which will be added automatically for indexing the time intervals of stgrid).

W

an object of class "SpatialPolygons" representing the observation region. It must have the same proj4string as events and all events must be within W. The function simplify.owin from package spatstat may be useful if polygonal operations take too long or memory is limited (see also the “Note” section below).

qmatrix

a square indicator matrix (0/1 or FALSE/TRUE) for possible transmission between the event types. The matrix will be internally converted to logical. Defaults to an independent spread of the event types, i.e. the identity matrix.

nCircle2Poly

accuracy (number of edges) of the polygonal approximation of a circle, see discpoly.

T

end of observation period (i.e. last stop time of stgrid). Must be specified if the start but not the stop times are supplied in stgrid (=> auto-generation of stop times).

clipper

polygon clipping engine to use for calculating the .influenceRegions of events (see the Value section below). Default is the polyclip package (called via intersect.owin from package spatstat). In surveillance <= 1.6-0, package gpclib was used, which has a restrictive license. This is no longer supported.

verbose

logical indicating if status messages should be printed during input checking and "epidataCS" generation. The default is to do so in interactive R sessions.

x

an object of class "epidataCS" or "summary.epidataCS", respectively.

n

a single integer. If positive, the first (head, print) / last (tail) n events are extracted. If negative, all but the n first/last events are extracted.

digits

minimum number of significant digits to be printed in values.

i,j,drop

arguments passed to the [-method for SpatialPointDataFrames for subsetting the events while retaining stgrid and W.
If drop=TRUE (the default), event types that completely disappear due to i-subsetting will be dropped, which reduces qmatrix and the factor levels of the type column.
By the j index, epidemic covariates can be removed from events.

...

unused (arguments of the generics) with a few exceptions: The print method for "epidataCS" passes ... to the print.data.frame method, and the print method for "summary.epidataCS" passes additional arguments to print.table.

subset, select

arguments used to subset the events from an "epidataCS" object like in subset.data.frame.

coords

logical indicating if the data frame of event marks returned by marks(x) should have the event coordinates appended as last columns. This defaults to TRUE.

object

an object of class "epidataCS".

Details

The function as.epidataCS is used to generate objects of class "epidataCS", which is the data structure required for twinstim models.

The extraction method for class "epidataCS" ensures that the subsetted object will be valid, for instance, it updates the auxiliary list of potential transmission paths stored in the object. This [-method is also the basis for the subset.epidataCS-method, which is implemented similar to the subset.data.frame-method.

The print method for "epidataCS" prints some metadata of the epidemic, e.g., the observation period, the dimensions of the spatio-temporal grid, the types of events, and the total number of events. By default, it also prints the first n = 6 rows of the events.

Value

An object of class "epidataCS" is a list containing the following components:

events

a "SpatialPointsDataFrame" (see the description of the argument). The input events are checked for requirements and sorted chronologically. The columns are in the following order: obligatory event columns, event marks, the columns BLOCK, start and endemic covariates copied from stgrid, and finally, hidden auxiliary columns. The added auxiliary columns are:

.obsInfLength

observed length of the infectious period (being part [0,T]), i.e. pmin(T-time, eps.t).

.sources

a list of numeric vectors of potential sources of infection (wrt the interaction ranges eps.s and eps.t) for each event. Row numbers are used as index.

.bdist

minimal distance of the event locations to the polygonal boundary W.

.influenceRegion

a list of influence regions represented by objects of the spatstat class "owin". For each event, this is the intersection of W with a (polygonal) circle of radius eps.s centered at the event's location, shifted such that the event location becomes the origin. The list has nCircle2Poly set as an attribute.

stgrid

a data.frame (see description of the argument). The spatio-temporal grid of endemic covariates is sorted by time interval (indexed by the added variable BLOCK) and region (tile). It is a full BLOCK x tile grid.

W

a "SpatialPolygons" object representing the observation region.

qmatrix

see the above description of the argument. The storage.mode of the indicator matrix is set to logical and the dimnames are set to the levels of the event types.

The nobs-method returns the number of events.

The head and tail methods subset the epidemic data using the extraction method ([), i.e. they return an object of class "epidataCS", which only contains (all but) the first/last n events.

For the "epidataCS" class, the method of the generic function marks defined by the spatstat package returns a data.frame of the event marks (actually also including time and location of the events), disregarding endemic covariates and the auxiliary columns from the events component of the "epidataCS" object.

The summary method (which has again a print method) returns a list of metadata, event data, the tables of tiles and types, a step function of the number of infectious individuals over time ($counter), i.e., the result of the as.stepfun-method for "epidataCS", and the number of potential sources of transmission for each event ($nSources) which is based on the given maximum interaction ranges eps.t and eps.s.

Note

The more detailed the observation region W is the longer will it take to fit a twinstim. It is often advisable to sacrifice some shape detail for speed by reducing polygon complexity using, e.g., the Douglas and Peucker (1973) reduction method available at MapShaper.org (Harrower and Bloch, 2006) or as function thinnedSpatialPoly in package maptools, or by passing via spatstat's simplify.owin procedure.

Author(s)

Sebastian Meyer with documentation contributions by Michael Höhle and Mayeul Kauffmann.

References

Douglas, D. H. and Peucker, T. K. (1973): Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: The International Journal for Geographic Information and Geovisualization, 10, 112-122.

Harrower, M. and Bloch, M. (2006): MapShaper.org: A Map Generalization Web Service. IEEE Computer Graphics and Applications, 26(4), 22-27.
DOI-Link: http://dx.doi.org/10.1109/MCG.2006.85

Meyer, S., Elias, J. and Höhle, M. (2012): A space-time conditional intensity model for invasive meningococcal disease occurrence. Biometrics, 68, 607-616.
DOI-Link: http://dx.doi.org/10.1111/j.1541-0420.2011.01684.x

Meyer, S. (2010): Spatio-Temporal Infectious Disease Epidemiology based on Point Processes. Master's Thesis, Ludwig-Maximilians-Universität München.
Available as http://epub.ub.uni-muenchen.de/11703/

See Also

plot.epidataCS for plotting, and animate.epidataCS for the animation of such an epidemic. There is also an update method for the "epidataCS" class. Models for "epidataCS" can be fitted with twinstim. It is also possible to convert the data to epidata objects (discrete space) for analysis with twinSIR (see as.epidata.epidataCS).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
## load "imdepi" example data (which is an object of class "epidataCS")
data("imdepi")

## print and summary
print(imdepi, n=5, digits=2)
print(s <- summary(imdepi))
plot(s$counter,  # same as 'as.stepfun(imdepi)'
     xlab = "Time [days]", ylab="Number of infectious individuals",
     main=paste("Time course of the number of infectious individuals",
                "assuming an infectious period of 30 days", sep="\n"))
plot(table(s$nSources), xlab="Number of \"close\" infective individuals",
     ylab="Number of events",
     main=paste("Distribution of the number of potential sources",
                "assuming an interaction range of 200 km and 30 days",
                sep="\n"))
## the summary object contains further information
str(s)

## internal structure of an "epidataCS"-object
str(imdepi, max.level=4)
## see help("imdepi") for more info on the data set

## extraction methods subset the 'events' component
## (thereby taking care of the validity of the epidataCS object,
## for instance the hidden auxiliary column .sources)
imdepi[101:200,]
tail(imdepi, n=4)           # reduce the epidemic to the last 4 events
subset(imdepi, type=="B")   # only consider event type B

## see help("plot.epidataCS") for convenient plot-methods for "epidataCS"


###
### reconstruct the "imdepi" object from its components
###

## events
events <- marks(imdepi)
coordinates(events) <- c("x", "y")  # promote to a "SpatialPointsDataFrame"
proj4string(events) <- proj4string(imdepi$events)       # ETRS89 projection
summary(events)

## endemic covariates
head(stgrid <- imdepi$stgrid[,-1])

## (Simplified) observation region (as SpatialPolygons)
load(system.file("shapes", "districtsD.RData", package="surveillance"),
     verbose = TRUE)

## plot observation region with events
plot(stateD, axes=TRUE); title(xlab="x [km]", ylab="y [km]")
points(events, pch=unclass(events$type), cex=0.5, col=unclass(events$type))
legend("topright", legend=levels(events$type), title="Type", pch=1:2, col=1:2)

## reconstruct the "imdepi" object from its components
myimdepi <- as.epidataCS(events = events, stgrid = stgrid,
                         W = stateD, qmatrix = diag(2), nCircle2Poly = 16)
## -> equal to 'imdepi' as long as the internal structures of the embedded
##    classes ("owin", "SpatialPolygons", ...), and the calculation of the
##    influence regions by "polyclip" do not change:
##all.equal(imdepi, myimdepi, tolerance=1E-6)

jimhester/surveillance documentation built on May 19, 2019, 10:33 a.m.