knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 12, fig.height = 5, fig.align = "center", out.width = "100%", dpi = 96, message = FALSE, warning = FALSE ) library(Nestimate) set.seed(20260413)
human_ai_long is a bundled dataset in Nestimate containing coded action sequences from 429 human-AI coding sessions across 34 projects. Every row is a single action taken during a session with a cluster label grouping actions into six broad types: Action, Communication, Directive, Evaluative, Metacognitive, Repair.
data(human_long, package = "Nestimate") dat <- as.data.frame(human_long) cat("rows:", nrow(dat), "| sessions:", length(unique(dat$session_id)), "| projects:", length(unique(dat$project)), "\n\n") print(table(dat$cluster))
For each session, the first half of its actions is labeled "early" and the second half "late". Base R ave() does both jobs — per-session count and per-session position — and then a single ifelse() writes the label.
dat <- dat[order(dat$session_id, dat$order_in_session), ] n_per <- ave(dat$order_in_session, dat$session_id, FUN = length) pos <- ave(dat$order_in_session, dat$session_id, FUN = seq_along) dat$half <- ifelse(pos <= n_per %/% 2, "early", "late") print(table(dat$half))
build_network() is the canonical entry point. Passing group = "half" produces a netobject_group with one netobject per half. Each netobject's $data field holds the session-half sequences.
net <- build_network( data = dat, actor = "session_id", action = "cluster", group = "half", method = "relative" ) net
sequence_compare() accepts a netobject_group directly — group labels are read from the list names, no separate group argument needed. Pattern lengths 3–5, minimum frequency 25, chi-square test with FDR correction.
res <- sequence_compare( net, sub = 3:5, min_freq = 25L, test = "chisq", adjust = "fdr" ) res
head(res$patterns, 10)
For every pattern, the standardized residual is computed from a 2x2 contingency table (this pattern vs. everything else):
$$\text{stdres}{ij} = \frac{O{ij} - E_{ij}}{\sqrt{E_{ij} \cdot (1 - r_i/N) \cdot (1 - c_j/N)}}$$
early → over-represented in the first half of sessionslate → over-represented in the second half|z| > 1.96 corresponds to p < 0.05; |z| > 3 is very strong evidenceBack-to-back bars with residual labels inside each segment. Both sides use the same standardized-residual color scale.
plot(res, style = "pyramid", show_residuals = TRUE)
Same top patterns, same color scale, alternative layout. Works for any number of groups (pyramid requires exactly 2).
plot(res, style = "heatmap")
By default patterns are ranked by test statistic. Pass sort = "frequency" to rank by total occurrence count instead — useful for focusing on the most common patterns regardless of their group difference.
plot(res, style = "pyramid", sort = "frequency", show_residuals = TRUE)
This vignette uses test = "chisq" because the split-within-session design makes the two halves from the same session non-independent (same human, same AI, same project). The chi-square answers the k-gram-level question "do the rates differ between halves?" and is the right tool for this design.
test = "permutation" shuffles group labels at the sequence level and assumes exchangeability across sequences — it's the right choice when the groups are independent cohorts (e.g., Project_A vs Project_B), not when each session contributes to both groups.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.