tree_scan: Tree-Based Scan Statistic

View source: R/tree_scan.R

tree_scanR Documentation

Tree-Based Scan Statistic

Description

Performs the tree-based scan statistic for detecting clusters in hierarchical data. Uses a Poisson or binomial model with Monte Carlo simulation (implemented in C++ via Rcpp) for significance testing.

Usage

tree_scan(
  tree = NULL,
  cases,
  population = NULL,
  nsim = 999L,
  alpha = 0.05,
  model = c("poisson", "binomial"),
  seed = NULL,
  n_cores = 1L,
  tree_node_id = NULL,
  tree_parent_id = NULL
)

Arguments

tree

A data.frame with columns node_id and parent_id. Root node(s) must have parent_id = NA. Alternatively, pass the tree as parallel vectors via tree_node_id and tree_parent_id.

cases

A numeric vector of case counts at the leaf level.

population

A numeric vector of population at the leaf level, or a single value. For the binomial model, population is the number of trials (cases + controls) per leaf and is required. For the Poisson model, defaults to 1 per leaf if NULL.

nsim

Integer. Number of Monte Carlo simulations. Default is 999.

alpha

Numeric. Significance level. Default is 0.05.

model

Character. Likelihood model: either "poisson" (default) or "binomial".

seed

Integer or NULL. Random seed for the Monte Carlo loop. When non-NULL, the user's pre-existing RNG state is saved on entry and restored on exit, so the seed argument affects only this call and does not leak into subsequent draws in the user's session.

n_cores

Integer. Number of OpenMP threads for the Monte Carlo loop. Default is 1L (serial). Set higher to parallelize.

tree_node_id, tree_parent_id

Optional parallel vectors describing the tree as an alternative to the tree data.frame. Both must have the same length, and the root node(s) must have tree_parent_id = NA. Ignored when tree is supplied.

Value

An object of class "tree_scan" (see package help for details).

References

Kulldorff, M., Fang, Z., & Walsh, S. J. (2003). A tree-based scan statistic for database disease surveillance. Biometrics, 59(2), 323–331.

See Also

circular_scan, treespatial_scan, aggregate_tree

Examples

tree <- data.frame(
  node_id   = c(1, 2, 3, 4, 5, 6, 7, 8),
  parent_id = c(NA, 1, 1, 2, 2, 3, 3, 3)
)
cases <- c(50, 5, 3, 2, 4)
pop   <- c(100, 100, 100, 100, 100)

result <- tree_scan(tree, cases, population = pop, nsim = 99)
print(result)

treeSS documentation built on May 16, 2026, 1:08 a.m.