frequencies: Sequence Data Conversion Functions

View source: R/frequencies.R

frequenciesR Documentation

Sequence Data Conversion Functions

Description

Functions for converting sequence data (long or wide format) into transition frequency matrices and other useful representations.

Convert long or wide format sequence data into a transition frequency matrix. Counts how many times each transition from state_i to state_j occurs across all sequences.

Usage

frequencies(
  data,
  action = "Action",
  id = NULL,
  time = "Time",
  cols = NULL,
  format = c("auto", "long", "wide")
)

Arguments

data

Data frame containing sequence data in long or wide format.

action

Character. Name of the column containing actions/states (for long format). Default: "Action".

id

Character vector. Name(s) of the column(s) identifying sequences. For long format, each unique combination of ID values defines a sequence. For wide format, used to exclude non-state columns. Default: NULL.

time

Character. Name of the time column used to order actions within sequences (for long format). Default: "Time".

cols

Character vector. Names of columns containing states (for wide format). If NULL, all non-ID columns are used. Default: NULL.

format

Character. Format of input data: "auto" (detect automatically), "long", or "wide". Default: "auto".

Details

For long format data, each row is a single action/event. Sequences are defined by the id column(s), and actions are ordered by the time column within each sequence. Consecutive actions within a sequence form transition pairs.

For wide format data, each row is a sequence and columns represent consecutive time points. Transitions are counted across consecutive columns, skipping any NA values.

Value

A square integer matrix of transition frequencies where mat[i, j] is the number of times state i was followed by state j. Row and column names are the sorted unique states. Can be passed directly to tna::tna().

See Also

convert_sequence_format for converting to other representations (frequency counts, one-hot, edge lists).

Examples

# Wide format
seqs <- data.frame(V1 = c("A","B","A"), V2 = c("B","A","C"), V3 = c("A","C","B"))
freq <- frequencies(seqs, format = "wide")

# Long format
long <- data.frame(
  Actor = rep(1:2, each = 3), Time = rep(1:3, 2),
  Action = c("A","B","C","B","A","C")
)
freq <- frequencies(long, action = "Action", id = "Actor")


Nestimate documentation built on April 20, 2026, 5:06 p.m.