treespatial_scan: Tree-Spatial Scan Statistic

View source: R/treespatial_scan.R

treespatial_scanR Documentation

Tree-Spatial Scan Statistic

Description

Performs the tree-spatial scan statistic, combining Kulldorff's circular scan (spatial clusters) with the tree-based scan (hierarchical data mining). Searches all combinations of spatial zones and tree branches to identify pairs (z, g) with significantly more cases than expected.

Usage

treespatial_scan(
  cases,
  population,
  region_id,
  x,
  y,
  node_id,
  tree = NULL,
  tree_node_id = NULL,
  tree_parent_id = NULL,
  max_pop_pct = 0.5,
  nsim = 999L,
  alpha = 0.05,
  model = c("poisson", "binomial"),
  seed = NULL,
  n_cores = 1L
)

Arguments

cases

Numeric vector. Number of cases observed for each (region, leaf) pair. Length n.

population

Numeric vector. Population (or denominator) of the region for each row. The same value should be repeated across all rows of a given region; if it varies, the first occurrence per region is used and a warning is issued.

region_id

Vector of region identifiers. Length n.

x, y

Numeric vectors of region centroid coordinates. Like population, these should be constant within region.

node_id

Vector of tree leaf identifiers. Length n. Each value must match a leaf of the tree.

tree

A data.frame with columns node_id and parent_id. The root node(s) must have parent_id = NA. As an alternative, pass tree_node_id and tree_parent_id as parallel vectors instead of this argument.

tree_node_id, tree_parent_id

Optional. Parallel vectors describing the tree edges, used as an alternative to tree. If both tree and these vectors are supplied, an error is raised.

max_pop_pct

Numeric. Maximum proportion of total population allowed inside a zone. Default 0.5.

nsim

Integer. Number of Monte Carlo simulations. Default 999.

alpha

Numeric. Significance level. Default 0.05.

model

Character. "poisson" (default) or "binomial".

seed

Integer or NULL. Random seed for the Monte Carlo loop. When non-NULL, the user's pre-existing RNG state is saved on entry and restored on exit, so the seed argument affects only this call and does not leak into subsequent draws in the user's session.

n_cores

Integer. OpenMP threads for the Monte Carlo loop. Default 1L (serial).

Details

Inputs are passed as parallel vectors of equal length (one entry per (region, tree-leaf) observation). The user is responsible for choosing which column to use as population (e.g., total population, live births, person-years), making the choice of denominator explicit.

Secondary clusters. The returned object contains the most likely cluster as well as the full set of evaluated (zone, branch) pairs in secondary_clusters. To obtain the distinct secondary clusters as described in Section 5.1.1 of Cancado et al. (2025) (filtering out pairs that overlap in regions or branches with already-retained clusters), use filter_clusters or get_cluster_regions with n_clusters > 1.

Value

An object of class "treespatial_scan".

References

Cancado, A. L. F., Oliveira, G. S., Quadros, A. V. C., & Duczmal, L. (2025). A tree-spatial scan statistic. Environmental and Ecological Statistics, 32, 953–978. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s10651-025-00670-w")}

See Also

circular_scan, tree_scan, aggregate_tree, filter_clusters, get_cluster_regions, iterative_scan

Examples

set.seed(123)
n_regions <- 10
tree <- data.frame(
  node_id   = c(1, 2, 3, 4, 5, 6, 7),
  parent_id = c(NA, 1, 1, 2, 2, 3, 3)
)
# Build vectors: one row per (region, leaf) combination
grid <- expand.grid(region_id = 1:n_regions, node_id = c(4, 5, 6, 7))
xs   <- runif(n_regions, 0, 10)[grid$region_id]
ys   <- runif(n_regions, 0, 10)[grid$region_id]
cs   <- rpois(nrow(grid), lambda = 5)
cs[grid$node_id == 4 & grid$region_id %in% 1:3] <- rpois(3, 30)

result <- treespatial_scan(
  cases       = cs,
  population  = rep(1000, nrow(grid)),
  region_id   = grid$region_id,
  x           = xs,
  y           = ys,
  node_id     = grid$node_id,
  tree        = tree,
  nsim        = 99
)
print(result)

treeSS documentation built on May 16, 2026, 1:08 a.m.