| build_hon | R Documentation |
Constructs a Higher-Order Network from sequential data, faithfully implementing the BuildHON algorithm (Xu, Wickramarathne & Chawla, 2016).
The algorithm detects when a first-order Markov model is insufficient to capture sequential dependencies and automatically creates higher-order nodes. Uses KL-divergence to determine whether extending a node's history provides significantly different transition distributions.
build_hon(
data,
max_order = 5L,
min_freq = 1L,
collapse_repeats = FALSE,
method = "hon+"
)
data |
One of:
|
max_order |
Integer. Maximum order of the HON. Default 5. The algorithm may produce lower-order nodes if the data do not justify higher orders. |
min_freq |
Integer. Minimum frequency for a transition to be
considered. Transitions observed fewer than |
collapse_repeats |
Logical. If |
method |
Character. Algorithm to use: |
Node naming convention: Higher-order nodes use readable arrow
notation. A first-order node is simply "A". A second-order node
representing the context "came from A, now at B" is "A -> B".
Third-order: "A -> B -> C", etc.
Algorithm overview:
Count all subsequence transitions up to max_order + 1.
Build probability distributions, filtering by min_freq.
For each first-order source, recursively test whether extending the history (adding more context) produces a significantly different distribution (via KL-divergence vs. an adaptive threshold).
Build the network from the accepted rules, rewiring edges so higher-order nodes are properly connected.
An S3 object of class "net_hon" containing:
Weighted adjacency matrix (rows = from, cols = to).
Rows and columns use readable arrow notation (e.g., "A -> B").
Data frame with columns: path (full state sequence,
e.g., "A -> B -> C"), from (context/conditioning states),
to (predicted next state), count (raw frequency),
probability (transition probability), from_order,
to_order.
Character vector of HON node names in arrow notation.
Number of HON nodes.
Number of edges.
Character vector of unique original states.
The max_order parameter used.
Highest order actually present.
The min_freq parameter used.
Number of trajectories after parsing.
Logical. Always TRUE.
Xu, J., Wickramarathne, T. L., & Chawla, N. V. (2016). Representing higher-order dependencies in networks. Science Advances, 2(5), e1600028.
Saebi, M., Xu, J., Kaplan, L. M., Ribeiro, B., & Chawla, N. V. (2020). Efficient modeling of higher-order dependencies in networks: from algorithm to application for anomaly detection. EPJ Data Science, 9(1), 15.
seqs <- list(c("A","B","C","D"), c("A","B","C","A"), c("B","C","D","A"))
hon <- build_hon(seqs, max_order = 2)
# From list of trajectories
trajs <- list(
c("A", "B", "C", "D", "A"),
c("A", "B", "D", "C", "A"),
c("A", "B", "C", "D", "A")
)
hon <- build_hon(trajs, max_order = 3, min_freq = 1)
print(hon)
summary(hon)
# From data.frame (rows = trajectories)
df <- data.frame(T1 = c("A", "A"), T2 = c("B", "B"),
T3 = c("C", "D"), T4 = c("D", "C"))
hon <- build_hon(df, max_order = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.