scan_eb_zip | R Documentation |
Calculates the expectation-based scan statistic. See details below.
scan_eb_zip( counts, zones, baselines = NULL, probs = NULL, population = NULL, n_mcsim = 0, gumbel = FALSE, max_only = FALSE, rel_tol = 0.001 )
counts |
Either:
|
zones |
A list of integer vectors. Each vector corresponds to a single zone; its elements are the numbers of the locations in that zone. |
baselines |
Optional. A matrix of the same dimensions as |
probs |
Optional. A matrix of the same dimensions as |
population |
Optional. A matrix or vector of populations for each
location. Only needed if |
n_mcsim |
A non-negative integer; the number of replicate scan statistics to generate in order to calculate a P-value. |
gumbel |
Logical: should a Gumbel P-value be calculated? Default is
|
max_only |
Boolean. If |
rel_tol |
A positive scalar. If the relative change in the incomplete information likelihood is less than this value, then the EM algorithm is deemed to have converged. |
For the expectation-based zero-inflated Poisson scan statistic (Allévius & Höhle 2017), the null hypothesis of no anomaly holds that the count observed at each location i and duration t (the number of time periods before present) has a zero-inflated Poisson distribution with expected value parameter μ_{it} and structural zero probability p_{it}:
H_0 : Y_{it} \sim \textrm{ZIP}(μ_{it}, p_{it}).
This holds for all locations i = 1, …, m and all durations t = 1, …,T, with T being the maximum duration considered. Under the alternative hypothesis, there is a space-time window W consisting of a spatial zone Z \subset \{1, …, m\} and a time window D \subseteq \{1, …, T\} such that the counts in that window have their Poisson expected value parameters inflated by a factor q_W > 1 compared to the null hypothesis:
H_1 : Y_{it} \sim \textrm{ZIP}(q_W μ_{it}, p_{it}), ~~(i,t) \in W.
For locations and durations outside of this window, counts are assumed to
be distributed as under the null hypothesis. The sets Z considered
are those specified in the argument zones
, while the maximum
duration T is taken as the maximum value in the column
duration
of the input table
.
For each space-time window W considered, (the log of) a likelihood ratio is computed using the distributions under the alternative and null hypotheses, and the expectation-based Poisson scan statistic is calculated as the maximum of these quantities over all space-time windows. The expectation-maximization (EM) algorithm is used to obtain maximum likelihood estimates.
A list which, in addition to the information about the type of scan statistic, has the following components:
A list containing the number of the zone of the most likely
cluster (MLC), the locations in that zone, the duration of the
MLC, the calculated score, the relative risk, and the number of
iterations until convergence for the EM algorithm. In order, the
elements of this list are named zone_number, locations,
duration, score, relative_risk, n_iter
.
A data frame containing, for each combination of zone
and duration investigated, the zone number, duration, score,
relative risk, number of EM iterations. The table is sorted by
score with the top-scoring location on top. If
max_only = TRUE
, only contains a single row corresponding
to the MLC.
A data frame of the Monte Carlo replicates of the scan statistic (if any), and the corresponding zones and durations.
The Monte Carlo P-value.
A P-value obtained by fitting a Gumbel distribution to the replicate scan statistics.
The number of zones scanned.
The number of locations.
The maximum duration considered.
The number of Monte Carlo replicates made.
Allévius, B. and Höhle, M, An expectation-based space-time scan statistic for ZIP-distributed data, (Technical report), Link to PDF.
if (require("gamlss.dist")) { set.seed(1) # Create location coordinates, calculate nearest neighbors, and create zones n_locs <- 50 max_duration <- 5 n_total <- n_locs * max_duration geo <- matrix(rnorm(n_locs * 2), n_locs, 2) knn_mat <- coords_to_knn(geo, 15) zones <- knn_zones(knn_mat) # Simulate data baselines <- matrix(rexp(n_total, 1/5), max_duration, n_locs) probs <- matrix(runif(n_total) / 4, max_duration, n_locs) counts <- matrix(gamlss.dist::rZIP(n_total, baselines, probs), max_duration, n_locs) # Inject outbreak/event/anomaly ob_dur <- 3 ob_cols <- zones[[10]] ob_rows <- max_duration + 1 - seq_len(ob_dur) counts[ob_rows, ob_cols] <- gamlss.dist::rZIP( ob_dur * length(ob_cols), 2 * baselines[ob_rows, ob_cols], probs[ob_rows, ob_cols]) res <- scan_eb_zip(counts = counts, zones = zones, baselines = baselines, probs = probs, n_mcsim = 9, max_only = FALSE, rel_tol = 1e-3) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.