ngkd: Implements The Approach of Kulldorff (2001)
In ClementCalenge/scankd: Space-Time Scan Statistics

Description Usage Arguments Details Value Author(s) References Examples

View source: R/kulldorff.R

Implements The Approach of Kulldorff (2001)

ngkd(pts, radius, time)

## S3 method for class 'ngkd'
print(x, ...)

Kulldorff2001(observed, theoretical, ngkd, loglambda = TRUE)

identifyClusters(spixdf, ngkd, selection = c("nclust", "lambda"),
  nclustMax = nrow(spixdf), lambda = NULL, plot = TRUE)

circleskd(x, text = NA)

`pts`	an object inheriting the class SpatialPixels of the package sp, containing the grid of pixels used in the scan (each pixel being the possible center of a circular cluster).
`radius`	a numeric vector of positive and increasing values corresponding to the possible cluster radiuses (ordered increasingly) tested in the scan. The units should be the same as those used in `pts`.
`time`	a numeric vector of positive and increading values corresponding to the possible cluster durations tested in the scan (ordered increasingly). The time units are not important, as these values are not used in the calculations (only for display purposes).
`x`	for `prind.ngkd`, an object of class "ngkd" returned by the function `ngkd`. For `circleskd`, an object of class `SpatialPointsDataFrame` indicating the location of the cluster centers, returned by `identifyClusters`.
`observed`	a data.frame with `Nt` columns and `Np` rows (where `Nt` is the number of time units considered in the scan, and `Np` the number of pixels in `pts`), containing the number of cases observed in each pixel during each time unit.
`theoretical`	a data.frame with `Nt` columns and `Np` rows (where `Nt` is the number of time units considered in the scan, and `Np` the number of pixels in `pts`), containing the number of cases expected in each pixel during each time unit under "normal" conditions.
`ngkd`	an object of class "ngkd" returned by the function `ngkd`.
`loglambda`	logical. Should the actual value of the likelihood ratio be returned, or its logarithm (the latter often results in clearer maps).
`spixdf`	an object of class `SpatialPixelsDataFrame` returned by the function `Kulldorff2001`.
`selection`	a character string indicating how to identify the clusters: if `"nclust"`, a fixed number of clusters is identified (this number corresponds to the argument `nclustMax`. If `"lambda"`, all the clusters characterized by a likelihood ratio (or log-likelihood ratio, according to what is stored in `spixdf`) greater than the parameter `lambda` are identified.
`nclustMax`	the maximum number of clusters that should be returned by the function, if `selection="nclust"`.
`lambda`	the threshold of the (log-) likelihood ratio, above which the clusters are considered significant and are returned by the function, if `selection="lambda"`.
`plot`	logical. If `TRUE`, the function plots the location of the most likely clusters on a map of the (log-)likelihood ratios for each pixel.
`text`	Either a missing value, in which case the function indicates the order of the clusters (first, second, etc.) with their duration at the center of the cluster; or a character string indicating the variable in `x` that should be used to plot the text at the center of the cluster (e.g. "radius", "time", "index", "loglambda"); or a character vector with the same number of elements as `nrow(x)` indicating for each cluster what text to write at the center of the cluster.
`...`	additional arguments to be passed to or from other functions.

ngkd prepares the data for the scan (identification of neighbourhood for each pixel. Kulldorff2001 implements the scan statistics, and calculates the most likely cluster (size, duration and likelihood ratio) associated to each pixel of the map. identifyClusters identifies and plots the most likely non-overlapping clusters on the map. circleskd can be used to plot the clusters to an already existing image of the area.

ngkd returns a list of class "ngkd" required by all the functions of the packages (stores all the spatial and temporal information about the scan). Kulldorff2001 implements the scan statistics of Kulldorff (2001) and return, for each pixel of the map, the (log-)likelihood ratio associated to the best cluster centered on this pixel, as well as the diameter and duration of the pixel, and the observed and expected number of cases in the best cluster (these variables are returned as a SpatialPixelsDataFrame). identifyClusters returns a SpatialPointsDataFrame containing the most likely nonoverlapping clusters. Each row corresponds to the center of a cluster, and the data slot contains the pixel index in the map used to create the ngkd object corresponding to this point, the size and duration of the cluster, as well as the (log-) likelihood ratio corresponding to this pixel.

Clement Calenge, clement.calenge@oncfs.gouv.fr

Kulldorff, M. (2001) Prospective time periodic geographical disease surveillance using a scan statistic. Journal of the Royal Statistical Society: Series A, 164, 61-72.

## We have a map of France
image(francekd)


##############################################
##
## Simulation of a dataset. Beginning of the simulations
##
## We simulate the following situation:
## (i) a "normal noise" of cases during 52 weeks (in average, 0.001
##     cases expected per pixel):
set.seed(9809)
cases <- as.data.frame(matrix(as.double(rpois(nrow(francekd)*52,0.001)),
                              nrow=nrow(francekd)))

## (ii) we add a cluster of "unusual" cases in the North-east of
## France for the last 8 weeks
df <- structure(list(gg = c(160L, 123L, 230L, 368L, 333L, 231L,
271L, 307L, 161L, 159L, 126L, 157L, 270L, 235L, 336L, 333L, 193L,
303L, 307L), gi = c(3L, 1L, 6L, 2L, 8L, 4L, 4L, 7L, 4L, 6L, 2L, 6L,
2L, 2L, 2L, 1L, 6L, 8L, 8L)), .Names = c("gg", "gi"), row.names =
c(NA, -19L), class = "data.frame")
for (i in 1:nrow(df)) {
   cases[df[i,1], df[i,2]] <- cases[df[i,1], df[i,2]]+1
}

## End of the simulations
##############################################

## Show these data (sum over the last year)
su <- apply(cases,1,sum)
francekd$totalCases <- su
image(francekd[,"totalCases"])
## Note the cluster in North-eastern France

## The object cases contains the number of cases recorded in each pixel
## (rows) for each week (column) with the first column corresponding to
## the most recent week.

## Suppose, for example, that we expect in theory 0.001 cases per
## pixel for each week (the model might be more complex, but is
## good enough for this demonstration). Therefore, the expected
## number of cases is:
expect <- as.data.frame(matrix(rep(0.001,nrow(francekd)*52),
                               nrow=nrow(francekd)))

## Now, we scan with the following radiuses (in meters)
(rad <- c(1:6)*50000)

## And every week, for one year, i.e.
(weekno <- 1:52)
## each element of weekno corresponds to a column in cases or in expect
## the most recent week is stored first in this data.frame

## Prepare the data:
ng <- ngkd(francekd, rad, weekno)

## Scan:
kd <- Kulldorff2001(cases, expect, ng)

## identify Clusters
clust <- identifyClusters(kd, ng)

## Note that the function identifies correctly both the size and
## the duration on he cluster.

## or, to show these clusters on the map of all cases:
image(francekd[,"totalCases"])
circleskd(clust, c("Best", "Second", "Third"))

## One remark: look at the values of loglambda, Nobs and Nth in clust:
clust

## The second cluster is characterized by a Nobs=6, whereas the expectation
## for this number is 0.764. The observed value seems much higher that
## the expectation. But remember that this cluster corresponds to a maximum
## value of log-lambda, calculated over clusters with 901 different possible
## centres (number of pixels in francekd), 6 different radiuses and 52
## possible duration. Look at the maximum number generated by a Poisson
## distribution with mean 0.764 when we draw 901*6*52 numbers.
## We simulate 100 times this maximum:
hist(sapply(1:100,function(r) max(rpois(900*6*52, 0.764))),
     xlab="Expected maximum number of obs. in a cluster", col="red",
     main="")
## 8 obs can be reasonably expected.


## Therefore: alternative way to identify the clusters: from simulations
## Imagine that some simulations indicate that all clusters with
## a log likelihood ratio greater than 20 are significant (see
## Kulldorf 2001 for a possible way to perform these simulations).

## We keep the clusters above 20
clust2 <- identifyClusters(kd, selection="lambda", lambda=20)
## There is one cluster identified:
clust2

## plot this cluster
image(francekd[,"totalCases"])
circleskd(clust2, "Best")