convert_sequence_format: Convert Sequence Data to Different Formats

View source: R/frequencies.R

convert_sequence_formatR Documentation

Convert Sequence Data to Different Formats

Description

Convert wide or long sequence data into frequency counts, one-hot encoding, edge lists, or follows format.

Usage

convert_sequence_format(
  data,
  seq_cols = NULL,
  id_col = NULL,
  action = NULL,
  time = NULL,
  format = c("frequency", "onehot", "edgelist", "follows")
)

Arguments

data

Data frame containing sequence data.

seq_cols

Character vector. Names of columns containing sequential states (for wide format input). If NULL, all columns except id_col are used. Default: NULL.

id_col

Character vector. Name(s) of the ID column(s). For wide format, defaults to the first column. For long format, required. Default: NULL.

action

Character or NULL. Name of the column containing actions/states (for long format input). If provided, data is treated as long format. Default: NULL.

time

Character or NULL. Name of the time column for ordering actions within sequences (for long format). Default: NULL.

format

Character. Output format:

"frequency"

Count of each action per sequence (wide, one column per state).

"onehot"

Binary presence/absence of each action per sequence.

"edgelist"

Consecutive transition pairs (from, to) per sequence.

"follows"

Each action paired with the action that preceded it.

Value

A data frame in the requested format:

frequency

ID columns + one integer column per state with counts.

onehot

ID columns + one binary column per state (0/1).

edgelist

ID columns + from and to columns.

follows

ID columns + act and follows columns.

See Also

frequencies for building transition frequency matrices.

Examples

# Wide format input
seqs <- data.frame(V1 = c("A","B","A"), V2 = c("B","A","C"), V3 = c("A","C","B"))
convert_sequence_format(seqs, format = "frequency")
convert_sequence_format(seqs, format = "edgelist")


Nestimate documentation built on April 20, 2026, 5:06 p.m.