frequencies: Sequence Data Conversion Functions
In Nestimate: Network Estimation, Bootstrap, and Higher-Order Analysis

frequencies

R Documentation

Sequence Data Conversion Functions

Description

Functions for converting sequence data (long or wide format) into transition frequency matrices and other useful representations.

Convert long or wide format sequence data into a transition frequency matrix. Counts how many times each transition from state_i to state_j occurs across all sequences.

Usage

frequencies(
  data,
  action = "Action",
  id = NULL,
  time = "Time",
  cols = NULL,
  format = c("auto", "long", "wide")
)

Arguments

`data`	Data frame containing sequence data in long or wide format.
`action`	Character. Name of the column containing actions/states (for long format). Default: "Action".
`id`	Character vector. Name(s) of the column(s) identifying sequences. For long format, each unique combination of ID values defines a sequence. For wide format, used to exclude non-state columns. Default: NULL.
`time`	Character. Name of the time column used to order actions within sequences (for long format). Default: "Time".
`cols`	Character vector. Names of columns containing states (for wide format). If NULL, all non-ID columns are used. Default: NULL.
`format`	Character. Format of input data: "auto" (detect automatically), "long", or "wide". Default: "auto".

Details

For long format data, each row is a single action/event. Sequences are defined by the id column(s), and actions are ordered by the time column within each sequence. Consecutive actions within a sequence form transition pairs.

For wide format data, each row is a sequence and columns represent consecutive time points. Transitions are counted across consecutive columns, skipping any NA values.

Value

A square integer matrix of transition frequencies where mat[i, j] is the number of times state i was followed by state j. Row and column names are the sorted unique states. Can be passed directly to tna::tna().

Examples

# Wide format
seqs <- data.frame(V1 = c("A","B","A"), V2 = c("B","A","C"), V3 = c("A","C","B"))
freq <- frequencies(seqs, format = "wide")

# Long format
long <- data.frame(
  Actor = rep(1:2, each = 3), Time = rep(1:3, 2),
  Action = c("A","B","C","B","A","C")
)
freq <- frequencies(long, action = "Action", id = "Actor")

Nestimate documentation built on April 20, 2026, 5:06 p.m.