edmst.test: Early Stopping Dynamic Minimum Spanning Tree spatial scan...
In jpfrench81/smerc: Statistical Methods for Regional Counts

edmst.test

R Documentation

Early Stopping Dynamic Minimum Spanning Tree spatial scan test

Description

edmst.test implements the early stopping dynamic Minimum Spanning Tree scan test of Costa et al. (2012). Starting with a single region as a current zone, new candidate zones are constructed by combining the current zone with the connected region that maximizes the resulting likelihood ratio test statistic. This procedure is repeated until adding a connected region does not increase the test statistic (or the population or distance upper bounds are reached). The same procedure is repeated for each region. The clusters returned are non-overlapping, ordered from most significant to least significant. The first cluster is the most likely to be a cluster. If no significant clusters are found, then the most likely cluster is returned (along with a warning).

Usage

edmst.test(
  coords,
  cases,
  pop,
  w,
  ex = sum(cases)/sum(pop) * pop,
  nsim = 499,
  alpha = 0.1,
  ubpop = 0.5,
  ubd = 1,
  longlat = FALSE,
  cl = NULL
)

Arguments

`coords`	An `n \times 2` matrix of centroid coordinates for the regions in the form (x, y) or (longitude, latitude) is using great circle distance.
`cases`	The number of cases observed in each region.
`pop`	The population size associated with each region.
`w`	A binary spatial adjacency matrix for the regions.
`ex`	The expected number of cases for each region. The default is calculated under the constant risk hypothesis.
`nsim`	The number of simulations from which to compute the p-value.
`alpha`	The significance level to determine whether a cluster is signficant. Default is 0.10.
`ubpop`	The upperbound of the proportion of the total population to consider for a cluster.
`ubd`	A proportion in (0, 1]. The distance of potential clusters must be no more than `ubd * m`, where `m` is the maximum intercentroid distance between all coordinates.
`longlat`	The default is `FALSE`, which specifies that Euclidean distance should be used. If `longlat` is `TRUE`, then the great circle distance is used to calculate the intercentroid distance.
`cl`	A cluster object created by `makeCluster`, or an integer to indicate number of child-processes (integer values are ignored on Windows) for parallel evaluations (see Details on performance). It can also be `"future"` to use a future backend (see Details), `NULL` (default) refers to sequential evaluation.

Details

The maximum intercentroid distance can be found by executing the command: gedist(as.matrix(coords), longlat = longlat), based on the specified values of coords and longlat.

Value

Returns a smerc_cluster object.

Author(s)

Joshua French

References

Costa, M.A. and Assuncao, R.M. and Kulldorff, M. (2012) Constrained spanning tree algorithms for irregularly-shaped spatial clustering, Computational Statistics & Data Analysis, 56(6), 1771-1783. <doi:10.1016/j.csda.2011.11.001>

Examples

data(nydf)
data(nyw)
coords <- with(nydf, cbind(longitude, latitude))
out <- edmst.test(
  coords = coords, cases = floor(nydf$cases),
  pop = nydf$pop, w = nyw,
  alpha = 0.12, longlat = TRUE,
  nsim = 5, ubpop = 0.1, ubd = 0.2
)
# better plotting
if (require("sf", quietly = TRUE)) {
   data(nysf)
   plot(st_geometry(nysf), col = color.clusters(out))
}

jpfrench81/smerc documentation built on Oct. 28, 2024, 12:21 a.m.