build_hon: Build a Higher-Order Network (HON)
In Nestimate: Network Estimation, Bootstrap, and Higher-Order Analysis

build_hon

R Documentation

Build a Higher-Order Network (HON)

Description

Constructs a Higher-Order Network from sequential data, faithfully implementing the BuildHON algorithm (Xu, Wickramarathne & Chawla, 2016).

The algorithm detects when a first-order Markov model is insufficient to capture sequential dependencies and automatically creates higher-order nodes. Uses KL-divergence to determine whether extending a node's history provides significantly different transition distributions.

Usage

build_hon(
  data,
  max_order = 5L,
  min_freq = 1L,
  collapse_repeats = FALSE,
  method = "hon+"
)

Arguments

`data`	One of: `data.frame`: rows are trajectories, columns are time steps. Trailing `NA`s are stripped. All non-NA values are coerced to character. `list`: each element is a character (or coercible) vector representing one trajectory. `tna`: a tna object with sequence data. Numeric state IDs are automatically converted to label names. `netobject`: a netobject with sequence data.
`max_order`	Integer. Maximum order of the HON. Default 5. The algorithm may produce lower-order nodes if the data do not justify higher orders.
`min_freq`	Integer. Minimum frequency for a transition to be considered. Transitions observed fewer than `min_freq` times are treated as zero. Default 1.
`collapse_repeats`	Logical. If `TRUE`, adjacent duplicate states within each trajectory are collapsed before analysis. Default `FALSE`.
`method`	Character. Algorithm to use: `"hon+"` (default, parameter-free BuildHON+ with lazy observation building and MaxDivergence pruning) or `"hon"` (original BuildHON with eager observation building).

Details

Node naming convention: Higher-order nodes use readable arrow notation. A first-order node is simply "A". A second-order node representing the context "came from A, now at B" is "A -> B". Third-order: "A -> B -> C", etc.

Algorithm overview:

Count all subsequence transitions up to max_order + 1.
Build probability distributions, filtering by min_freq.
For each first-order source, recursively test whether extending the history (adding more context) produces a significantly different distribution (via KL-divergence vs. an adaptive threshold).
Build the network from the accepted rules, rewiring edges so higher-order nodes are properly connected.

Value

An S3 object of class "net_hon" containing:

matrix: Weighted adjacency matrix (rows = from, cols = to). Rows and columns use readable arrow notation (e.g., "A -> B").
edges: Data frame with columns: path (full state sequence, e.g., "A -> B -> C"), from (context/conditioning states), to (predicted next state), count (raw frequency), probability (transition probability), from_order, to_order.
nodes: data.frame with columns id, label, name (one row per HON node; label/name are the arrow-notation node names). Stored as a data.frame for cograph_network compatibility.
n_nodes: Number of HON nodes.
n_edges: Number of edges.
first_order_states: Character vector of unique original states.
max_order_requested: The max_order parameter used.
max_order_observed: Highest order actually present.
min_freq: The min_freq parameter used.
n_trajectories: Number of trajectories after parsing.
directed: Logical. Always TRUE.

References

Xu, J., Wickramarathne, T. L., & Chawla, N. V. (2016). Representing higher-order dependencies in networks. Science Advances, 2(5), e1600028.

Saebi, M., Xu, J., Kaplan, L. M., Ribeiro, B., & Chawla, N. V. (2020). Efficient modeling of higher-order dependencies in networks: from algorithm to application for anomaly detection. EPJ Data Science, 9(1), 15.

Examples

seqs <- list(c("A","B","C","D"), c("A","B","C","A"), c("B","C","D","A"))
hon <- build_hon(seqs, max_order = 2)


# From list of trajectories
trajs <- list(
  c("A", "B", "C", "D", "A"),
  c("A", "B", "D", "C", "A"),
  c("A", "B", "C", "D", "A")
)
hon <- build_hon(trajs, max_order = 3, min_freq = 1)
print(hon)
summary(hon)

# From data.frame (rows = trajectories)
df <- data.frame(T1 = c("A", "A"), T2 = c("B", "B"),
                 T3 = c("C", "D"), T4 = c("D", "C"))
hon <- build_hon(df, max_order = 2)

Nestimate documentation built on July 11, 2026, 1:09 a.m.