stacked_sequence_plot: Stacked Sequence Plots of Multichannel Sequences and/or Most...

View source: R/stacked_sequence_plot.R

stacked_sequence_plotR Documentation

Stacked Sequence Plots of Multichannel Sequences and/or Most Probable Paths from Hidden Markov Models

Description

Function stacked_sequence_plot draws stacked sequence plots of sequence object created with the TraMineR::seqdef function or observations and/or most probable paths of model objects of seqHMM (e.g., hmm and mhmm).

Usage

stacked_sequence_plot(
  x,
  plots = "obs",
  type = "distribution",
  ids,
  sort_by = "none",
  sort_channel,
  dist_method = "OM",
  group = NULL,
  legend_position = "right",
  ...
)

Arguments

x

Either a hidden Markov model object of class hmm, mhmm, nhmm, or mnhmm, a sequence object of class stslist (created with the TraMineR::seqdef() function) or a list of stslist objects.

plots

What to plot. One of "obs" for observations (the default), "hidden_paths" for most probable paths of hidden states, or "both" for observations and hidden paths together. Latter two options are only possible for model objects.

type

The type of the plot. Available types are "index" for sequence index plots and "distribution" for state distribution plots (the default). See ggseqplot::ggseqiplot() and ggseqplot::ggseqdplot() for details.

ids

Indexes of the subjects to be plotted (the default is all). For example, 'ids = c(1:10, 15) plots the first ten subjects and subject 15 in the data.

sort_by

A sorting variable or a sort method (one of ⁠"none⁠, "start", "end", or "mds" for type = "index". Option "mds" arranges the sequences according to the scores of multidimensional scaling (using stats::cmdscale()). Default is "none", i.e., no sorting. Numeric vectors are passed to sortv argument of ggseqplot::ggseqiplot().

sort_channel

Name of the channel which should be used for the sorting. Alternatively value "Hidden states" uses the hidden state sequences for sorting. Default is to sort by the first channel in the data. If sort_by = "mds", all channels are used for defining the sorting.

dist_method

The metric to be used for computing the distances of the sequences if multidimensional scaling is used for sorting. One of "OM" (optimal matching, the default), "LCP" (longest common prefix), "RLCP" (reversed LCP, i.e. longest common suffix), "LCS" (longest common subsequence), "HAM" (Hamming distance), and "DHD" (dynamic Hamming distance). Transition rates are used for defining substitution costs if needed. See TraMineR::seqdef() for more information on the metrics.

group

Variable used for grouping the sequences in each channel, which is passed to ggseqplot::ggseqiplot() and ggseqplot::ggseqdplot(). By default, no grouping is done, except for mixture models where the grouping is based on most probable clusters (defined by the most probable hidden paths). Grouping by clusters can be overloaded by supplying variable for group or by setting group = NA.

legend_position

Position of legend for each channel, passed to legend.position argument of ggplot2::theme(). Either a vector of length 1, or of length matching the number of channels to be plotted.

...

Other arguments to ggseqplot::ggseqiplot() or ggseqplot::ggseqdplot().

Examples

p <- stacked_sequence_plot(
  mhmm_biofam, 
  plots = "both", 
  type = "d", 
  legend_position = c("right", "right", "right", "none")
)
library("ggplot2")
p & theme(plot.margin = unit(c(1, 1, 0, 2), "mm"))


seqHMM documentation built on June 8, 2025, 10:16 a.m.