knitr::opts_chunk$set( collapse = TRUE, fig.width = 6, fig.height = 4, out.width = "100%", message = FALSE, fig.show = TRUE, dev = "jpeg", dpi = 100, results = "hide", comment = "#>" ) suppressPackageStartupMessages({ library("tna") library("tibble") library("dplyr") })
The tna
package accepts sequence data (a stslist
object created by the TraMineR
package), a wide data.frame
where each row represents a sequence and each column represents a timepoint, or a transition matrix. When our data does not follow any of those formats, we can use the prepare_data
function from tna
to get it in the right shape.
Let's start by creating an example dataframe containing event logs where each row represents an action performed by a user
at a specific timestamp.
The achievement
column indicates whether the user was associated with high or low achievement.
df <- tribble( ~user, ~timestamp, ~event, ~achievement, ~order, 2, "2025-02-27 18:01:32", "Plan", "High", 1, 2, "2025-02-27 18:03:32", "Goals", "High", 2, 1, "2025-02-27 18:08:32", "Goals", "High", 1, 3, "2025-02-27 18:15:32", "Plan", "High", 1, 5, "2025-02-27 18:16:32", "Help", "Low", 1, 3, "2025-02-27 18:19:32", "Goals", "High", 2, 5, "2025-02-27 18:19:32", "Plan", "Low", 2, 2, "2025-02-27 18:20:32", "Environment", "High", 3, 1, "2025-02-27 18:25:32", "Task", "High", 2, 4, "2025-02-27 18:25:32", "Help", "Low", 1, 5, "2025-02-27 18:26:32", "Task", "Low", 3, 3, "2025-02-27 18:32:32", "Metacognition", "High", 3, 4, "2025-02-27 18:33:32", "Goals", "Low", 2, 1, "2025-02-27 18:36:32", "Environment", "High", 3, 1, "2025-02-27 18:44:32", "Task", "High", 4, 1, "2025-02-27 18:45:32", "Task", "High", 5, 5, "2025-02-27 18:46:32", "Help", "Low", 4, 1, "2025-02-27 19:01:32", "Plan", "High", 6, 2, "2025-02-27 19:06:32", "Environment", "High", 4, 4, "2025-02-27 19:06:32", "Plan", "Low", 3, 1, "2025-02-27 19:13:32", "Metacognition", "High", 7, 3, "2025-02-27 19:13:32", "Metacognition", "High", 4, 4, "2025-02-27 19:15:32", "Goals", "Low", 4, 4, "2025-02-27 19:20:32", "Metacognition", "Low", 5, 4, "2025-02-27 19:20:32", "Environment", "Low", 6, 5, "2025-02-27 19:23:32", "Metacognition", "Low", 5, 3, "2025-02-27 19:25:32", "Help", "High", 5, 2, "2025-02-27 19:27:32", "Metacognition", "High", 5, 2, "2025-02-27 19:33:32", "Environment", "High", 6, 4, "2025-02-27 19:46:32", "Environment", "Low", 7, 5, "2025-02-27 19:49:32", "Plan", "Low", 6, 2, "2025-02-27 19:55:32", "Goals", "High", 7 )
When we are interested in the whole dataset as a single sequence, and the data is already ordered chronologically, we just need to specify the action
column, which in this case is called event
. We can pass the result of calling prepare_data
directly to the tna
function to create a tna
model.
by_classroom <- prepare_data(df, action = "event") tna_by_classroom <- tna(by_classroom) plot(tna_by_classroom)
When we want to create a sequence per user, in addition to action
, we need to specify the actor
column, which in this case is user
. If the data is already ordered, we do not need to provide any additional arguments. If it is not ordered, we can provide a column to order the data by, in this case order
.
by_user <- prepare_data(df, actor = "user", action = "event", order = "order") tna_by_user <- tna(by_user) plot(tna_by_user)
If rather than the order we only have the timestamps, we can provide it as time
column. By default, events happening less than 15 minutes apart will be grouped in the same sequence, while events that happen after a longer time, will mark the start of a new sequence (i.e., session). If both time
and order
are provided, the data will be first ordered by time
, and in case of a tie, by order
.
by_session <- prepare_data(df, actor = "user", time = "timestamp", action = "event") tna_by_session <- tna(by_session) plot(tna_by_session)
If we want to customize the time gap that marks the start of a new sequence, we can do so by customizing the time_threshold
argument (in minutes).
by_session_custom <- prepare_data( df, actor = "user", time = "timestamp", action = "event", time_threshold = 10 * 60 # 10 minutes ) tna_by_session_custom <- tna(by_session_custom) plot(tna_by_session_custom)
Another advantage of using prepare_data
prior to constructing the tna
model is that we get to keep other variables of the data and use them in our analysis. For instance, we can use group_tna
to create a tna
model by achievement group just by passing the result of prepare_data
as a first argument, and indicating the name of the column in the data that we want to group by.
layout(t(1:2)) gtna <- group_tna(by_user, group = "achievement") plot(gtna)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.