treespatial_scan: Tree-Spatial Scan Statistic
In treeSS: Tree-Spatial Scan Statistic for Cluster Detection

treespatial_scan

R Documentation

Tree-Spatial Scan Statistic

Description

Performs the tree-spatial scan statistic, combining Kulldorff's circular scan (spatial clusters) with the tree-based scan (hierarchical data mining). Searches all combinations of spatial zones and tree branches to identify pairs (z, g) with significantly more cases than expected.

Usage

treespatial_scan(
  cases,
  population,
  region_id,
  x,
  y,
  node_id,
  tree = NULL,
  tree_node_id = NULL,
  tree_parent_id = NULL,
  max_pop_pct = 0.5,
  nsim = 999L,
  alpha = 0.05,
  model = c("poisson", "binomial"),
  seed = NULL,
  n_cores = 1L
)

Arguments

`cases`	Numeric vector. Number of cases observed for each (region, leaf) pair. Length `n`.
`population`	Numeric vector. Population (or denominator) of the region for each row. The same value should be repeated across all rows of a given region; if it varies, the first occurrence per region is used and a warning is issued.
`region_id`	Vector of region identifiers. Length `n`.
`x`, `y`	Numeric vectors of region centroid coordinates. Like `population`, these should be constant within region.
`node_id`	Vector of tree leaf identifiers. Length `n`. Each value must match a leaf of the tree.
`tree`	A `data.frame` with columns `node_id` and `parent_id`. The root node(s) must have `parent_id = NA`. As an alternative, pass `tree_node_id` and `tree_parent_id` as parallel vectors instead of this argument.
`tree_node_id`, `tree_parent_id`	Optional. Parallel vectors describing the tree edges, used as an alternative to `tree`. If both `tree` and these vectors are supplied, an error is raised.
`max_pop_pct`	Numeric. Maximum proportion of total population allowed inside a zone. Default `0.5`.
`nsim`	Integer. Number of Monte Carlo simulations. Default `999`.
`alpha`	Numeric. Significance level. Default `0.05`.
`model`	Character. `"poisson"` (default) or `"binomial"`.
`seed`	Integer or `NULL`. Random seed for the Monte Carlo loop. When non-`NULL`, the user's pre-existing RNG state is saved on entry and restored on exit, so the seed argument affects only this call and does not leak into subsequent draws in the user's session.
`n_cores`	Integer. OpenMP threads for the Monte Carlo loop. Default `1L` (serial).

Details

Inputs are passed as parallel vectors of equal length (one entry per (region, tree-leaf) observation). The user is responsible for choosing which column to use as population (e.g., total population, live births, person-years), making the choice of denominator explicit.

Secondary clusters. The returned object contains the most likely cluster as well as the full set of evaluated (zone, branch) pairs in secondary_clusters. To obtain the distinct secondary clusters as described in Section 5.1.1 of Cancado et al. (2025) (filtering out pairs that overlap in regions or branches with already-retained clusters), use filter_clusters or get_cluster_regions with n_clusters > 1.

Value

An object of class "treespatial_scan".

References

Cancado, A. L. F., Oliveira, G. S., Quadros, A. V. C., & Duczmal, L. (2025). A tree-spatial scan statistic. Environmental and Ecological Statistics, 32, 953–978. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s10651-025-00670-w")}

Examples

set.seed(123)
n_regions <- 10
tree <- data.frame(
  node_id   = c(1, 2, 3, 4, 5, 6, 7),
  parent_id = c(NA, 1, 1, 2, 2, 3, 3)
)
# Build vectors: one row per (region, leaf) combination
grid <- expand.grid(region_id = 1:n_regions, node_id = c(4, 5, 6, 7))
xs   <- runif(n_regions, 0, 10)[grid$region_id]
ys   <- runif(n_regions, 0, 10)[grid$region_id]
cs   <- rpois(nrow(grid), lambda = 5)
cs[grid$node_id == 4 & grid$region_id %in% 1:3] <- rpois(3, 30)

result <- treespatial_scan(
  cases       = cs,
  population  = rep(1000, nrow(grid)),
  region_id   = grid$region_id,
  x           = xs,
  y           = ys,
  node_id     = grid$node_id,
  tree        = tree,
  nsim        = 99
)
print(result)

treeSS documentation built on May 16, 2026, 1:08 a.m.