mst.seq | R Documentation |
mst.seq
finds the sequence of connected regions
that maximize the spatial scan statistic (the likelihood
ratio test statistic) from a starting region. The set of
connected regions at each step is a candidate zone. The
zone continues to grow until no region should be added to
the zone due to relevant constraints (size, connectivity,
or other stopping criteria). This function is not
intended to be used by users directly, but it can be
quite educational for seeing the spread of the cluster.
Consequently, it prioritizes efficiency over user
friendliness.
mst.seq(
start,
neighbors,
cases,
pop,
w,
ex,
ty,
max_pop,
type = "maxonly",
nlinks = "one",
early = FALSE
)
start |
The initial region to start the candidate zone. |
neighbors |
A vector containing the neighbors for the starting region (in ascending order of distance from the region). The staring region itself is included among the neighbors. |
cases |
The number of cases observed in each region. |
pop |
The population size associated with each region. |
w |
A binary spatial adjacency matrix for the regions. |
ex |
The expected number of cases for each region. The default is calculated under the constant risk hypothesis. |
ty |
The total number of cases in the study area. |
max_pop |
The population upperbound (in total population) for a candidate zone. |
type |
One of |
nlinks |
A character vector. The options are
|
early |
A logical value indicating whether the
"early" stopping criterion should be used. If
|
The function can be used to construct candidate zones for the dynamic minimum spanning tree (dmst), early stopping dynamic minimum spanning tree (edmst), double connection spatial scan test (dc), and maximum linkage spatial scan test (mlink).
type
is a character vector indicating what should
be returned by the function. If type = "maxonly"
,
then only the maximum of the log likelihood ratio test
statistic across all candidate zones is returned. If
type = "pruned"
,, the function returns a list that
includes the location ids, test statistic, total cases,
expected cases, and total population for the zone with
the maximum test statistic. It type = "all"
, the
same information the same information is returned for the
entire sequence of zones.
If nlinks = "one"
, then a region only needs to be
connected to one other region in the current zone to be
considered for inclusion in the next zone. If
nlinks = "two"
, then the region must be connected
to at least two other regions in the current zone. If
nlinks = "max"
, then only regions with the maximum
number of connections to the current zone are considered
for inclusion in the next zone.
Returns a list of relevant information. See Details.
Joshua French
# load data
data(nydf)
data(nyw)
# create relevant data
coords <- nydf[, c("longitude", "latitude")]
cases <- floor(nydf$cases)
pop <- nydf$population
w <- nyw
ex <- sum(cases) / sum(pop) * pop
ubpop <- 0.5
ubd <- 0.5
ty <- sum(cases) # total number of cases
# intercentroid distances
d <- gedist(as.matrix(coords), longlat = TRUE)
# upperbound for population in zone
max_pop <- ubpop * sum(pop)
# upperbound for distance between centroids in zone
max_dist <- ubd * max(d)
# create list of neighbors for each region (inclusive of region itself)
all_neighbors <- nndist(d, ubd)
# find the dmst max zone
mst.seq(
start = 1, all_neighbors[[1]], cases, pop, w, ex,
ty, max_pop
)
mst.seq(
start = 1, all_neighbors[[1]], cases, pop, w, ex,
ty, max_pop, "pruned"
)
bigout <- mst.seq(
start = 1, all_neighbors[[1]], cases, pop,
w, ex, ty, max_pop, "all"
)
head(bigout)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.